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Cross-References to Related Applications 

This application is a continuation-in-part of U.S. provisional patent application serial no. 
60/068,755, filed December 23, 1997, and of U.S. provisional patent application serial no. 
60/080,664, filed April 3, 1998, and of U.S. provisional patent application serial no. 60/105,234, 
filed October 21, 1998, each of which applications are incorporated herein by reference. 

Field of the Invention 

The present invention relates to novel polynucleotides, particularly to novel 
polynucleotides of human origin that are expressed in a selected cell type, are differentially 
expressed in one cell type relative to another cell type (e.g., in cancerous cells, or in cells of a 
specific tissue origin) and/or share homology to polynucleotides encoding a gene product having 
an identified functional domain and/or activity. 

Background of the Invention 

Identification of novel polynucleotides, particularly those that encode an expressed gene 
product, is important in the advancement of drug discovery, diagnostic technologies, and the 
understanding of the progression and nature of complex diseases such as cancer. Identification 
of genes expressed in different cell types isolated from sources that differ in disease state or 
stage, developmental stage, exposure to various environmental factors, the tissue of origin, the 
species from which the tissue was isolated, and the like is key to identifying the genetic factors 
that are responsible for the phenotypes associated with these various differences 

This invention provides novel human polynucleotides, the polypeptides encoded by 
these polynucleotides, and the genes and proteins corresponding to these novel polynucleotides. 
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Summary of the Invention 

This invention relates to novel human polynucleotides and variants thereof, their 
encoded polypeptides and variants thereof, to genes corresponding to these polynucleotides and 
to proteins expressed by the genes. The invention also relates to diagnostic and therapeutic 
agents employing such novel human polynucleotides, their corresponding genes or gene 
products, e.g., these genes and proteins, including probes, antisense constructs, and antibodies. 

Accordingly, in one embodiment, the present invention features a library of 
polynucleotides, the library comprising the sequence information of at least one of SEQ ID 
NOS: 1-844. In related aspects, the invention features a library provided on a nucleic acid array, 
or in a computer-readable format. 

In one embodiment, the library is comprises a differentially expressed polynucleotide 
comprising a sequence selected from the group consisting of SEQ ID NOS:9, 39, 42, 52, 62, 74, 
119, 172, 317, and 379. In specific related embodiments, the library comprises: 1) a 
polynucleotide that is differentially expressed in a human breast cancer cell, where the 
polynucleotide comprises a sequence selected from the group consisting of SEQ ID NOS: 4, 9, 
39, 42, 52, 62, 65, 66, 68, 74, 81, 1 14, 123, 144, 130, 157, 162, 172, 178, 183, 202, 214, 219, 
223, 258, 298, 317, 338, 379, 384, 386, and 388; 2) a polynucleotide differentially expressed in 
a human colon cancer cell, where the polynucleotide comprises a sequence selected from the 
group consisting of SEQ ID NOS: 1, 39, 52, 97, 119, 134, 172, 176, 241, 288, 317, 357, 362, 
and 374; or 3) a polynucleotide differentially expressed in a human lung cancer cell, where the 
polynucleotide comprises a sequence selected from the group consisting of SEQ ID NOS: 9, 34, 
42, 62, 74, 106, 1 19, 135, 154, 160, 260, 308, 323, 349, 361, 369, 371, 379, 395, 381, and 400. 

In another aspect, the invention features an isolated polynucleotide comprising a 
nucleotide sequence having at least 90% sequence identity to an identifying sequence of SEQ ID 
NOS: 1-844 or a degenerate variant thereof. In related aspects, the invention features 
recombinant host cells and vectors comprising the polynucleotides of the invention, as well as 
isolated polypeptides encoded by the polynucleotides of the invention and antibodies that 
specifically bind such polypeptides. 



In one embodiment, the invention features an isolated polynucleotide comprising a 
sequence encoding a polypeptide of a protein family selected from the group consisting of: 
4 transmembrane segments integral membrane proteins, 7 transmembrane receptors, ATPases 
associated with various cellular activities (AAA), eukaryotic aspartyl proteases, GATA family 
of transcription factors, G-protein alpha subunit, phorbol esters/diacylglycerol binding proteins, 
protein kinase, protein phosphatase 2C, protein tyrosine phosphatase, trypsin, wnt family of 
developmental signaling proteins, and WW/rsp5/WWP domain containing proteins. In a 
specific related embodiment, the invention features a polynucleotide comprising a sequence of 
one of SEQ ID NOS: 24, 41, 101, 157, 291, 305, 315, 341, 63, 1 16, 134, 136, 151, 384, 404, 
308, 213, 367, 188, 251, 202, 315, 367, 397, 256, 382, 169, 23, 291, 324, 330, 341, 353, 188, 
379 , and 395. 

In another embodiment, the invention features a polynucleotide comprising a sequence 
encoding a polypeptide having a functional domain selected from the group consisting of: Ank 
repeat, basic region plus leucine zipper transcription factors, bromodomain, EF-hand, SH3 
domain, WD domain/G-beta repeats, zinc finger (C2H2 type), zinc finger (CCHC class), and 
zinc-binding metalloprotease domain. In a specific related embodiment, the invention features a 
polynucleotide comprising a sequence of one of SEQ ID NOS: 1 16, 251, 374, 97, 136, 242, 379, 
306, 386, 18, 335, 61, 306, 386, 322, 306, and 395. 

In another aspect, the invention features a method of detecting differentially expressed 
genes correlated with a cancerous state of a mammalian cell, where the method comprises the 
step of detecting at least one differentially expressed gene product in a test sample derived from 
a cell suspected of being cancerous, where the gene product is encoded by a gene corresponding 
to a sequence of at least one of SEQ ID NOS:4, 9, 39, 42, 52, 62, 65, 66, 68, 74, 81, 1 14, 123, 
144, 130, 157, 162, 172, 178, 183, 202, 214, 219, 223, 258, 298, 317, 338, 379, 384, 386, 388, 
1, 39, 52, 97, 1 19, 134, 172, 176, 241, 288, 317, 357, 362, 374, 9, 34, 42, 62, 74, 106, 119, 135, 
154, 160, 260, 308, 323, 349, 361, 369, 371, 379, 395, 381, and 400. Detection of the 
differentially expressed gene product is correlated with a cancerous state of the cell from which 
the test sample was derived. In one embodiment, the detecting is by hybridization of the test 



sample to a reference array, wherein the reference array comprises an identifying sequence of at 
least one of SEQ ID NOS: 1-844. 

In one embodiment of the method of the invention, the cell is a breast tissue derived cell, 
and the differentially expressed gene product is encoded by a gene corresponding to a sequence 
5 of at least one of SEQ ID NOS: 4, 9, 39, 42, 52, 62, 65, 66, 68, 74, 81, 1 14, 123, 144, 130, 157, 
162, 172, 178, 183, 202, 214, 219, 223, 258, 298, 317, 338, 379, 384, 386, and 388. 

In another embodiment of the method of the invention, the cell is a colon tissue derived 
cell, and differentially expressed gene product is encoded by a gene corresponding to a sequence 
of at least one of SEQ ID NOS: 1, 39, 52, 97, 119, 134, 172, 176, 241, 288, 317, 357, 362, and 
fc 10 374. 

P In yet another embodiment of the method of the invention, the cell is a lung tissue 

£P derived cell, and differentially expressed gene product is encoded by a gene corresponding to a 
1 sequence of at least one of SEQ ID NOS: 9, 34, 42, 62, 74, 106, 1 19, 135, 154, 160, 260, 308, 
m 323, 349, 361, 369, 371, 379, 395, 381, and 400. 

O 1 5 Other aspects and embodiments of the invention will be readily apparent to the ordinarily 

\ ^ 

y : skilled artisan upon reading the description provided herein. 

HI Detailed Description of the Invention 

The invention relates to polynucleotides comprising the disclosed nucleotide sequences, 
20 to full length cDNA, mRNA and genes corresponding to these sequences, and to polypeptides 
and proteins encoded by these polynucleotides and genes. 

Also included are polynucleotides that encode polypeptides and proteins encoded by the 
polynucleotides of the Sequence Listing. The various polynucleotides that can encode these 
polypeptides and proteins differ because of the degeneracy of the genetic code, in that most 
25 amino acids are encoded by more than one triplet codon. The identity of such codons is well- 
known in this art, and this information can be used for the construction of the polynucleotides 
within the scope of the invention. 

Polynucleotides encoding polypeptides and proteins that are variants of the polypeptides 
and proteins encoded by the polynucleotides and related cDNA and genes are also within the 
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scope of the invention. The variants differ from wild type protein in having one or more amino 
acid substitutions that either enhance, add, or diminish a biological activity of the wild type 
protein. Once the amino acid change is selected, a polynucleotide encoding that variant is 
constructed according to the invention. 

The following detailed description describes the polynucleotide compositions 
encompassed by the invention, methods for obtaining cDNA or genomic DNA encoding a full- 
length gene product, expression of these polynucleotides and genes, identification of structural 
motifs of the polynucleotides and genes, identification of the function of a gene product encoded 
by a gene corresponding to a polynucleotide of the invention, use of the provided 
polynucleotides as probes and in mapping and in tissue profiling, use of the corresponding 
polypeptides and other gene products to raise antibodies, and use of the polynucleotides and 
their encoded gene products for therapeutic and diagnostic purposes. 

I. Polynucleotide Compositions 

The scope of the invention with respect to polynucleotide compositions includes, but is not 
necessarily limited to, polynucleotides having a sequence set forth in any one of SEQ ID 
NOS: 1-844; polynucleotides obtained from the biological materials described herein or other 
biological sources (particularly human sources) by hybridization under stringent conditions 
(particularly conditions of high stringency); genes corresponding to the provided 
polynucleotides; variants of the provided polynucleotides and their corresponding genes, 
particularly those variants that retain a biological activity of the encoded gene product {e.g., a 
biological activity ascribed to a gene product corresponding to the provided polynucleotides as a 
result of the assignment of the gene product to a protein family (ies) and/or identification of a 
functional domain present in the gene product). Other nucleic acid compositions contemplated 
by and within the scope of the present invention will be readily apparent to one of ordinary skill 
in the art when provided with the disclosure here. 



The invention features polynucleotides that are expressed in cells of human tissue, 
specifically human colon, breast, and/or lung tissue. Novel nucleic acid compositions of the 
invention of particular interest comprise a sequence set forth in any one of SEQ ID NOS: 1-844 
or an identifying sequence thereof. An "identifying sequence" is a contiguous sequence of 
residues at least about 10 nt to about 20 nt in length, usually at least about 50 nt to about 100 nt 
in length, that uniquely identifies a polynucleotide sequence, e.g., exhibits less than 90%, 
usually less than about 80% to about 85% sequence identity to any contiguous nucleotide 
sequence of more than about 20 nt. Thus, the subject novel nucleic acid compositions include 
full length cDNAs or mRNAs that encompass an identifying sequence of contiguous nucleotides 
from any one of SEQ ID NOS: 1-844. 

The polynucleotides of the invention also include polynucleotides having sequence 
similarity or sequence identity. Nucleic acids having sequence similarity are detected by 
hybridization under low stringency conditions, for example, at 50°C and 10XSSC (0.9 M 
saline/0.09 M sodium citrate) and remain bound when subjected to washing at 55°C in 1XSSC. 
Sequence identity can be determined by hybridization under stringent conditions, for example, 
at 50°C or higher and 0.1XSSC (9 mM saline/0.9 mM sodium citrate). Hybridization methods 
and conditions are well known in the art, see, e.g., U.S. Patent No. 5,707,829. Nucleic acids that 
are substantially identical to the provided polynucleotide sequences, e.g. allelic variants, 
genetically altered versions of the gene, etc., bind to the provided polynucleotide sequences 
(SEQ ID NOS: 1 -844) under stringent hybridization conditions. By using probes, particularly 
labeled probes of DNA sequences, one can isolate homologous or related genes. The source of 
homologous genes can be any species, e.g. primate species, particularly human; rodents, such as 
rats and mice, canines, felines, bovines, ovines, equines, yeast, nematodes, etc. 



Preferably, hybridization is performed using at least 1 5 contiguous nucleotides of at least 
one of SEQ ID NOS: 1-844. That is, when at least 15 contiguous nucleotides of one of the 

disclosed SEQ ID NOs. is used as a probe, the probe will preferentially hybridize with a gene or 
mRNA (of the biological material) comprising the complementary sequence, allowing the 
identification and retrieval of the nucleic acids of the biological material that uniquely hybridize 
to the selected probe. Probes from more than one SEQ ID NO. will hybridize with the same 
gene or mRNA if the cDNA from which they were derived corresponds to one mRNA. Probes 
of more than 1 5 nucleotides can be used, but 1 5 nucleotides represents enough sequence for 
unique identification. 

The polynucleotides of the invention also include naturally occurring variants of the 
nucleotide sequences (e.g., degenerate variants, allelic variants, etc.). Variants of the 
polynucleotides of the invention are identified by hybridization of putative variants with 
nucleotide sequences disclosed herein, preferably by hybridization under stringent conditions 
For example, by using appropriate wash conditions, variants of the polynucleotides of the 
invention can be identified where the allelic variant exhibits at most about 25-30% base pair 
mismatches relative to the selected polynucleotide probe. In general, allelic variants contain 15- 
25% base pair mismatches, and can contain as little as even 5-15%, or 2-5%, or 1-2% base pair 
mismatches, as well as a single base-pair mismatch. 

The invention also encompasses homologs corresponding to the polynucleotides of SEQ 
ID NOS: 1-844, where the source of homologous genes can be any mammalian species, e.g., 
primate species, particularly human; rodents, such as rats, canines, felines, bovines, ovines, 
equines, yeast, nematodes, etc. Between mammalian species, e.g., human and mouse, homologs 
have substantial sequence similarity, e.g., at least 75% sequence identity, usually at least 90%, 
more usually at least 95% between nucleotide sequences. Sequence similarity is calculated 
based on a reference sequence, which may be a subset of a larger sequence, such as a conserved 
motif, coding region, flanking region, etc. A reference sequence will usually be at least about 1 8 
contiguous nt long, more usually at least about 30 nt long, and may extend to the complete 
sequence that is being compared. Algorithms for sequence analysis are known in the art, such as 
BLAST, described in Altschul et a!., J. Mol. Biol (1990) 275:403-10. 



In general, variants of the invention have a sequence identity greater than at least about 
65%, preferably at least about 75%, more preferably at least about 85%, and can be greater than 
at least about 90% or more as determined by the Smith- Waterman homology search algorithm 
as implemented in MPSRCH program (Oxford Molecular). For the purposes of this invention, a 
preferred method of calculating percent identity is the Smith- Waterman algorithm, using the 
following. Global DNA sequence identity must be greater than 65% as determined by the 
Smith- Waterman homology search algorithm as implemented in MPSRCH program (Oxford 
Molecular) using an affine gap search with the following search parameters: gap open penalty, 
12; and gap extension penalty, 1. 

The subject nucleic acids can be cDNAs or genomic DNAs, as well as fragments thereof, 
particularly fragments that encode a biologically active gene product and/or are useful in the 
methods disclosed herein (e.g., in diagnosis, as a unique identifier of a differentially expressed 
gene of interest, etc.). The term "cDNA" as used herein is intended to include all nucleic acids 
that share the arrangement of sequence elements found in native mature rnRNA species, where 
sequence elements are exons and 3 and 5 non-coding regions. Normally rnRNA species have 
contiguous exons, with the intervening introns, when present, being removed by nuclear RNA 

splicing, to create a continuous open reading frame encoding a polypeptide of the invention. 

A genomic sequence of interest comprises the nucleic acid present between the initiation 
codon and the stop codon, as defined in the listed sequences, including all of the introns that are 
normally present in a native chromosome. It can further include the 3 and 5 untranslated regions 
found in the mature rnRNA. It can further include specific transcriptional and translational 
regulatory sequences, such as promoters, enhancers, etc. , including about 1 kb, but possibly 
more, of flanking genomic DNA at either the 5 and 3 end of the transcribed region. The 
genomic DNA can be isolated as a fragment of 100 kbp or smaller; and substantially free of 
flanking chromosomal sequence. The genomic DNA flanking the coding region, either 3 and 5 
, or internal regulatory sequences as sometimes found in introns, contains sequences required for 
proper tissue, stage-specific, or disease-state specific expression. 

The nucleic acid compositions of the subject invention can encode all or a part of the 
subject differentially expressed polypeptides. Double or single stranded fragments can be 
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obtained from the DNA sequence by chemically synthesizing oligonucleotides in accordance 
with conventional methods, by restriction enzyme digestion, by PCR amplification, etc. Isolated 
polynucleotides and polynucleotide fragments of the invention comprise at least about 10, about 
15, about 20, about 35, about 50, about 100, about 150 to about 200, about 250 to about 300, or 
about 350 contiguous nucleotides selected from the polynucleotide sequences as shown in SEQ 
ID NOS:l-844. For the most part, fragments will be of at least 15 nt, usually at least 18 nt or 
25 nt, and up to at least about 50 contiguous nt in length or more. In a preferred embodiment, 
the polynucleotide molecules comprise a contiguous sequence of at least twelve nucleotides 
selected from the group consisting of the polynucleotides shown in SEQ ID NOS: 1-844. 

Probes specific to the polynucleotides of the invention can be generated using the 
polynucleotide sequences disclosed in SEQ ID NOS: 1-844. The probes are preferably at least 
about 12, 15, 16, 18, 20, 22, 24, or 25 nucleotide fragment of a corresponding contiguous 
sequence of SEQ ID NOS: 1-844, and can be less than 2, 1, 0,5, 0.1, or 0.05 kb in length. The 
probes can be synthesized chemically or can be generated from longer polynucleotides using 
restriction enzymes. The probes can be labeled, for example, with a radioactive, biotinylated, or 
fluorescent tag. Preferably, probes are designed based upon an identifying sequence of a 
polynucleotide of one of SEQ ID NOS: 1-844. More preferably, probes are designed based on a 
contiguous sequence of one of the subject polynucleotides that remain unmasked following 
application of a masking program for masking low complexity (e.g., XBLAST) to the sequence., 
i.e., one would select an unmasked region, as indicated by the polynucleotides outside the poly- 
n stretches of the masked sequence produced by the masking program. 

The polynucleotides of the subject invention are isolated and obtained in substantial 
purity, generally as other than an intact chromosome. Usually, the polynucleotides, either as 
DNA or RNA, will be obtained substantially free of other naturally-occurring nucleic acid 
sequences, generally being at least about 50%, usually at least about 90% pure and are typically 
"recombinant", e.g., flanked by one or more nucleotides with which it is not normally associated 
on a naturally occurring chromosome. 

The polynucleotides of the invention can be provided as a linear molecule or within a 
circular molecule. They can be provided within autonomously replicating molecules (vectors) 
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or within molecules without replication sequences. They can be regulated by their own or by 
other regulatory sequences, as is known in the art. The polynucleotides of the invention can be 
introduced into suitable host cells using a variety of techniques which are available in the art, 
such as transferrin polycation-mediated DNA transfer, transfection with naked or encapsulated 
nucleic acids, liposome-mediated DNA transfer, intracellular transportation of DNA-coated 
latex beads, protoplast fusion, viral infection, electroporation, gene gun, calcium phosphate- 
mediated transfection, and the like. 

The subject nucleic acid compositions can be used to, for example, produce 
polypeptides, as probes for the detection of mRNA of the invention in biological samples (e.g., 
extracts of human cells) to generate additional copies of the polynucleotides, to generate 
ribozymes or antisense oligonucleotides, and as single stranded DNA probes or as triple-strand 
forming oligonucleotides. The probes described herein can be used to, for example, determine 
the presence or absence of the polynucleotide sequences as shown in SEQ ID NOS: 1-844 or 
variants thereof in a sample. These and other uses are described in more detail below. 

Use of Polynucleotides to Obtain Full-Length cDNA and Full-Length Human Gene and 
Promoter Region 

Full-length cDNA molecules comprising the disclosed polynucleotides are obtained as 
follows. A polynucleotide having a sequence of one of SEQ ID NOS: 1-844, or a portion thereof 
comprising at least 12, 15, 18, or 20 nucleotides, is used as a hybridization probe to detect 
hybridizing members of a cDNA library using probe design methods, cloning methods, and 
clone selection techniques such as those described in U.S. Patent No. 5,654,173. Libraries of 
cDNA are made from selected tissues, such as normal or tumor tissue, or from tissues of a 
mammal treated with, for example, a pharmaceutical agent. Preferably, the tissue is the same as 
the tissue from which the polynucleotides of the invention were isolated, as both the 
polynucleotides described herein and the cDNA represent expressed genes. Most preferably, the 
cDNA library is made from the biological material described herein in the Examples. 
Alternatively, many cDNA libraries are available commercially. (Sambrook et al 9 Molecular 
Cloning: A Laboratory Manual, 2nd Ed, (1989) Cold Spring Harbor Press, Cold Spring Harbor, 
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NY). The choice of cell type for library construction can be made after the identity of the 
protein encoded by the gene corresponding to the polynucleotide of the invention is known. 
This will indicate which tissue and cell types are likely to express the related gene, and thus 
represent a suitable source for the mRNA for generating the cDNA. Where the provided 
polynucleotides are isolated from cDNA libraries, the libraries are prepared from mRNA of 
human colon cells, more preferably, human colon cancer cells, even more preferably, from a 
highly metastatic colon cell, Kml2L4-A. 

Techniques for producing and probing nucleic acid sequence libraries are described, for 
example, in Sambrook et aL, Molecular Cloning: A Laboratory Manual, 2nd Ed, (1989) Cold 
Spring Harbor Press, Cold Spring Harbor, NY. The cDNA can be prepared by using primers 
based on sequence from SEQ ID NOS: 1-844. In one embodiment, the cDNA library can be 
made from only poly-adenylated mRNA. Thus, poly-T primers can be used to prepare cDNA 
from the mRNA. 

Members of the library that are larger than the provided polynucleotides, and preferably 
that encompass the complete coding sequence of the native message, are obtained. In order to 
confirm that the entire cDNA has been obtained, RNA protection experiments are performed as 
follows. Hybridization of a full-length cDNA to an mRNA will protect the RNA from RNase 
degradation. If the cDNA is not full length, then the portions of the mRNA that are not 
hybridized will be subject to RNase degradation. This is assayed, as is known in the art, by 
changes in electrophoretic mobility on polyacrylamide gels, or by detection of released 
monoribonucleotides. Sambrook et ah, Molecular Cloning: A Laboratory Manual, 2nd Ed, 
(1989) Cold Spring Harbor Press, Cold Spring Harbor, NY. In order to obtain additional 
sequences 5' to the end of a partial cDNA, 5 f RACE (PCR Protocols: A Guide to Methods and 
Applications, (1990) Academic Press, Inc.) is performed. 

Genomic DNA is isolated using the provided polynucleotides in a manner similar to the 
isolation of full-length cDNAs. Briefly, the provided polynucleotides, or portions thereof, are 
used as probes to libraries of genomic DNA. Preferably, the library is obtained from the cell 
type that was used to generate the polynucleotides of the invention, but this is not essential. 
Most preferably, the genomic DNA is obtained from the biological material described herein in 
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the Examples. Such libraries can be in vectors suitable for carrying large segments of a genome, 
such as PI or YAC, as described in detail in Sambrook et ai, 9.4-9.30. In addition, genomic 
sequences can be isolated from human BAC libraries, which are commercially available from 
Research Genetics, Inc., Huntville, Alabama, USA, for example. In order to obtain additional 5' 
or 3' sequences, chromosome walking is performed, as described in Sambrook et aL, such that 
adjacent and overlapping fragments of genomic DNA are isolated. These are mapped and pieced 
together, as is known in the art, using restriction digestion enzymes and DNA ligase. 

Using the polynucleotide sequences of the invention, corresponding full-length genes 
can be isolated using both classical and PCR methods to construct and probe cDNA libraries. 
Using either method, Northern blots, preferably, are performed on a number of cell types to 
determine which cell lines express the gene of interest at the highest level. Classical methods of 
constructing cDNA libraries are taught in Sambrook et al , supra. With these methods, cDNA 
can be produced from mRNA and inserted into viral or expression vectors. Typically, libraries 
of mRNA comprising poly(A) tails can be produced with poly(T) primers. Similarly, cDNA 
libraries can be produced using the instant sequences as primers. 

PCR methods are used to amplify the members of a cDNA library that comprise the 

desired insert. In this case, the desired insert will contain sequence from the full length cDNA 
that corresponds to the instant polynucleotides. Such PCR methods include gene trapping and 
RACE methods. Gene trapping entails inserting a member of a cDNA library into a vector. The 
vector then is denatured to produce single stranded molecules. Next, a substrate-bound probe, 
such a biotinylated oligo, is used to trap cDNA inserts of interest. Biotinylated probes can be 
linked to an avidin-bound solid substrate. PCR methods can be used to amplify the trapped 
cDNA. To trap sequences corresponding to the full length genes, the labeled probe sequence is 
based on the polynucleotide sequences of the invention. Random primers or primers specific to 
the library vector can be used to amplify the trapped cDNA. Such gene trapping techniques are 
described in Gruber et al, WO 95/04745 and Gruber et al, U.S. Pat No. 5,500,356. Kits are 
commercially available to perform gene trapping experiments from, for example, Life 
Technologies, Gaithersburg, Maryland, USA. 
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"Rapid amplification of cDNA ends," or RACE, is a PCR method of amplifying cDNAs 
from a number of different RNAs. The cDNAs are ligated to an oligonucleotide linker, and 
amplified by PCR using two primers. One primer is based on sequence from the instant 
polynucleotides, for which full length sequence is desired, and a second primer comprises 
sequence that hybridizes to the oligonucleotide linker to amplify the cDN A. A description of 
this methods is reported in WO 97/191 10. In preferred embodiments of RACE, a common 
primer is designed to anneal to an arbitrary adaptor sequence ligated to cDNA ends (Apte and 
Siebert, Biotechniques (1993) 75:890-893; Edwards et ah, Nuc. Acids Res. (1991) 79:5227- 
5232). When a single gene-specific RACE primer is paired with the common primer, 
preferential amplification of sequences between the single gene specific primer and the common 
primer occurs. Commercial cDNA pools modified for use in RACE are available. 

Another PCR-based method generates full-length cDNA library with anchored ends 
without needing specific knowledge of the cDNA sequence. The method uses lock-docking 
primers (I-VI), where one primer, poly TV (I-III) locks over the polyA tail of eukaryotic mRNA 
producing first strand synthesis and a second primer, polyGH (IV-VI) locks onto the polyC tail 
added by terminal deoxynucleotidyl transferase (TdT). This method is described in WO 
96/40998. 

The promoter region of a gene generally is located 5 ' to the initiation site for RNA 
polymerase II. Hundreds of promoter regions contain the "TATA" box, a sequence such as 
TATTA or TATAA, which is sensitive to mutations. The promoter region can be obtained by 
performing 5' RACE using a primer from the coding region of the gene. Alternatively, the 
cDNA can be used as a probe for the genomic sequence, and the region 5 ' to the coding region 
is identified by "walking up." If the gene is highly expressed or differentially expressed, the 
promoter from the gene can be of use in a regulatory construct for a heterologous gene. 

Once the full-length cDNA or gene is obtained, DNA encoding variants can be prepared 
by site-directed mutagenesis, described in detail in Sambrook et aL, 15.3-15.63. The choice of 
codon or nucleotide to be replaced can be based on disclosure herein on optional changes in 
amino acids to achieve altered protein structure and/or function. 
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As an alternative method to obtaining DNA or RNA from a biological material, nucleic 
acid comprising nucleotides having the sequence of one or more polynucleotides of the 
invention can be synthesized. Thus, the invention encompasses nucleic acid molecules ranging 
in length from 1 5 nucleotides (corresponding to at least 1 5 contiguous nucleotides of one of 
SEQ ID NOS: 1-844) up to a maximum length suitable for one or more biological 
manipulations, including replication and expression, of the nucleic acid molecule. The 
invention includes but is not limited to (a) nucleic acid having the size of a full gene, and 
comprising at least one of SEQ ID NOS: 1-844; (b) the nucleic acid of (a) also comprising at 
least one additional gene, operably linked to permit expression of a fusion protein; (c) an 
expression vector comprising (a) or (b); (d) a plasmid comprising (a) or (b) ; and (e) a 
recombinant viral particle comprising (a) or (b). Once provided with the polynucleotides 
disclosed herein, construction or preparation of (a) - (e) are well within the skill in the art. 

The sequence of a nucleic acid comprising at least 15 contiguous nucleotides of at least 
any one of SEQ ID NOS: 1-844, preferably the entire sequence of at least any one of SEQ ID 
NOS: 1-844, is not limited and can be any sequence of A, T, G, and/or C (for DNA) and A, U, 
G, and/or C (for RNA) or modified bases thereof, including inosine and pseudouridine. The 
choice of sequence will depend on the desired function and can be dictated by coding regions 
desired, the intron-like regions desired, and the regulatory regions desired. Where the entire 
sequence of any one of SEQ ID NOS: 1-844 is within the nucleic acid, the nucleic acid obtained 
is referred to herein as a polynucleotide comprising the sequence of any one of SEQ ID NOS: 1- 
844. 

II. Expression of Polypeptide Encoded by Full-Length cDNA or Full-Length Gene 

The provided polynucleotide (e.g., a polynucleotide having a sequence of one of SEQ ID 

NOS: 1-844), the corresponding cDNA, or the full-length gene is used to express a partial or 

complete gene product. 

Constructs of polynucleotides having sequences of SEQ ID NOS: 1-844 can be generated 

synthetically. Alternatively, single-step assembly of a gene and entire plasmid from large 

numbers of oligodeoxyribonucleotides is described by, e.g., Stemmer et aL 3 Gene (Amsterdam) 
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(1995) 164(l):49-53. In this method, assembly PCR (the synthesis of long DNA sequences 
from large numbers of oligodeoxyribonucleotides (oligos)) is described. The method is derived 
from DNA shuffling (Stemmer, Nature (1994) 570:389-391), and does not rely on DNA ligase, 
but instead relies on DNA polymerase to build increasingly longer DNA fragments during the 
assembly process. For example, a 1.1 -kb fragment containing the TEM-1 beta-lactamase- 
encoding gene (bla) can be assembled in a single reaction from a total of 56 oligos, each 40 
nucleotides (nt) in length. The synthetic gene can be PCR amplified and cloned in a vector 
containing the tetracycline-resistance gene (Tc-R) as the sole selectable marker. Without 
relying on ampicillin (Ap) selection, 76% of the Tc-R colonies were Ap-R, making this 
approach a general method for the rapid and cost-effective synthesis of any gene. 

Appropriate polynucleotide constructs are purified using standard recombinant DNA 
techniques as described in, for example, Sambrook et al, Molecular Cloning: A Laboratory 
Manual 2nd Ed, (1989) Cold Spring Harbor Press, Cold Spring Harbor, NY, and under current 
regulations described in United States Dept. of HHS, National Institute of Health (NIH) 
Guidelines for Recombinant DNA Research. The gene product encoded by a polynucleotide of 
the invention is expressed in any expression system, including, for example, bacterial, yeast, 
insect, amphibian and mammalian systems. Suitable vectors and host cells are described in U.S. 
Patent No. 5,654,173. 

Bacteria. Expression systems in bacteria include those described in Chang et ah, Nature 
(1978) 275:615; Goeddel et al, Nature (1979) 281:544; Goeddel et aL, Nucleic Acids Res. 
(1980) 5:4057; EP 0 036,776; U.S. Patent No. 4,551,433; DeBoer et al } Proa Natl. Acad. Set 
(USA) (1983) 50:21-25; and Siebenlist etal, G?//(1980) 20:269. 

Yeast. Expression systems in yeast include those described in Hinnen et al, Proc. Natl 
Acad Sci (USA) (1978) 75:1929; Ito et al, J. Bacteriol (1983) 755:163; Kurtz etaL, Mol Cell 
Biol (1986) 5:142; Kunze et al, 1 Basic Microbiol. (1985) 25:141; Gleeson et al, J. Gen. 
Microbiol (1986) 732:3459; Roggenkamp et al, Mol Gen. Genet. (1986) 202:302; Das et al, 1 
Bacteriol (1984) 755:1 165; De Louvencourt et al, J. Bacteriol (1983) 154:131; Van den Berg 
et al, Bio/Technology (1990) 5:135; Kunze et aU I Basic Microbiol (1985) 25:141; Cregg et 
al, Mol Cell Biol (1985) 5:3376; U.S. Patent Nos. 4,837,148 and 4,929,555; Beach and Nurse, 
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Nature (1981) 300:706; Davidow et al, Curr. Genet. (1985) 70:380; Gaillardin et al, Curr. 
Genet. (1985) 70:49; Ballance etal, Biochem. Biophys. Res. Commun. (1983) 772:284-289; 
Tilburn et al, Gene (1983) 2(5:205-221; Yelton et al, Proc. Natl Acad. Sci. (USA) (1984) 
£7:1470-1474; Kelly and Hynes, EMBOJ. (1985) 4:475479; EP 0 244,234; and WO 91/00357. 

Insect Cells. Expression of heterologous genes in insects is accomplished as described 
in U.S. Patent No. 4,745,051; Friesen et al, "The Regulation of Baculovirus Gene Expression", 
in: The Molecular Biology Of Baculoviruses (1986) (W. Doerfler, ed.); EP 0 127,839; EP 0 
155,476; and Vlak et al, J. Gen. Virol. (1988) 69:165-116; Miller etal, Ann. Rev. Microbiol. 
(1988) ¥2:177; Carbonell et al, Gene (1988) 75:409; Maeda et al, Nature (1985) 575:592-594; 
Lebacq-Verheyden et al, Mol. Cell. Biol. (1988) 5:3129; Smith et al, Proc. Natl. Acad. Sci. 
(USA) (1985) 52:8844; Miyajima et al, Gene (1987) 55:273; and Martin et al, DNA (1988) 
7:99. Numerous baculoviral strains and variants and corresponding permissive insect host cells 
from hosts are described in Luckow et al, Bio/Technology (1988) 5:47-55, Miller et al, Generic 
Engineering (1986) 5:277-279, and Maeda et al, Nature (1985) 575:592-594. 

Mammalian Cells. Mammalian expression is accomplished as described in Dijkema et 
al, EMBOJ. (1985) 4:761, Gorman et al, Proc. Natl. Acad. Sci. (USA) (1982) 79:6777, Boshart 
et al, Cell (1985) 47:521 and U.S. Patent No. 4,399,216. Other features of mammalian 
expression are facilitated as described in Ham and Wallace, Meth. Enz. (1979) 55:44, Barnes 
and Sato 5 ^«a/. Biochem. (1980) 702:255, U.S. Patent Nos. 4,767,704, 4,657,866, 4,927,762, 
4,560,655, WO 90/103430, WO 87/00195, and U.S. RE 30,985. 

Polynucleotide molecules comprising a polynucleotide sequence provided herein 
propagated by placing the molecule in a vector. Viral and non- viral vectors are used, including 
plasmids. The choice of plasmid will depend on the type of cell in which propagation is desired 
and the purpose of propagation. Certain vectors are useful for amplifying and making large 
amounts of the desired DNA sequence. Other vectors are suitable for expression in cells in 
culture. Still other vectors are suitable for transfer and expression in cells in a whole animal or 
person. The choice of appropriate vector is well within the skill of the art. Many such vectors 
are available commercially. The partial or full-length polynucleotide is inserted into a vector 
typically by means of DNA ligase attachment to a cleaved restriction enzyme site in the vector. 
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Alternatively, the desired nucleotide sequence can be inserted by homologous recombination in 
vivo. Typically this is accomplished by attaching regions of homology to the vector on the 
flanks of the desired nucleotide sequence. Regions of homology are added by ligation of 
oligonucleotides, or by polymerase chain reaction using primers comprising both the region of 
homology and a portion of the desired nucleotide sequence, for example. 

The polynucleotides set forth in SEQ ID NOS: 1-844 or their corresponding full-length 
polynucleotides are linked to regulatory sequences as appropriate to obtain the desired 
expression properties. These can include promoters (attached either at the 5 f end of the sense 
strand or at the 3' end of the antisense strand), enhancers, terminators, operators, repressors, and 
inducers. The promoters can be regulated or constitutive. In some situations it may be desirable 
to use conditionally active promoters, such as tissue-specific or developmental stage-specific 
promoters. These are linked to the desired nucleotide sequence using the techniques described 
above for linkage to vectors. Any techniques known in the art can be used. 

When any of the above host cells, or other appropriate host cells or organisms, are used 
to replicate and/or express the polynucleotides or nucleic acids of the invention, the resulting 
replicated nucleic acid, RNA, expressed protein or polypeptide, is within the scope of the 
invention as a product of the host cell or organism. The product is recovered by any appropriate 
means known in the art. 

Once the gene corresponding to a selected polynucleotide is identified, its expression can 
be regulated in the cell to which the gene is native. For example, an endogenous gene of a cell 
can be regulated by an exogenous regulatory sequence as disclosed in U.S. Patent No. 
5,641,670. 
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III. Identification of Functional and Structural Motifs of Novel Genes 

A. Screening Polynucleotide Sequences and Amino Acid Sequences Against 
Publicly Available Databases 
5 Translations of the nucleotide sequence of the provided polynucleotides, cDNAs 

or Ml genes can be aligned with individual known sequences. Similarity with individual 
sequences can be used to determine the activity of the polypeptides encoded by the 
polynucleotides of the invention. For example, sequences that show similarity with a 
chemokine sequence can exhibit chemokine activities. Also, sequences exhibiting 
t 1 0 similarity with more than one individual sequence can exhibit activities that are 

W characteristic of either or both individual sequences. 

s ;i 

S3R!f 

\ J The full length sequences and fragments of the polynucleotide sequences of the nearest 

rrt 

i'ri. neighbors can be used as probes and primers to identify and isolate the full length sequence 

};| corresponding to provided polynucleotides. The nearest neighbors can indicate a tissue or cell 

55/' !■■ 

*■ ■ 1 5 type to be used to construct a library for the full-length sequences corresponding to the provided 

f 

m polynucleotides.. 

? 3 ■ ■ 

i n Typically, a selected polynucleotide is translated in all six frames to determine the best 

or 

O alignment with the individual sequences. The sequences disclosed herein in the Sequence 

s ^ ** 

iU . m 

Listing are in a 5' to 3' orientation and translation in three frames can be sufficient (with a few 
20 specific exceptions as described in the Examples). These amino acid sequences are referred to, 
generally, as query sequences, which will be aligned with the individual sequences. Databases 
with individual sequences are described in "Computer Methods for Macromolecular Sequence 
Analysis" Methods in Enzymology (1 996) 266, Doolittle, Academic Press, Inc., a division of 
Harcourt Brace & Co., San Diego, California, USA. Databases include Genbank, EMBL, and 
25 DNA Database of Japan (DDBJ). 

Query and individual sequences can be aligned using the methods and computer 
programs described above, and include BLAST, available over the world wide web at 
http://ww.ncbi.nlm.nih.gov/BLAST/ . Another alignment algorithm is Fasta, available in the 
Genetics Computing Group (GCG) package, Madison, Wisconsin, USA, a wholly owned 
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subsidiary of Oxford Molecular Group, Inc. Other techniques for alignment are described in 
Doolittle, supra. Preferably, an alignment program that permits gaps in the sequence is utilized 
to align the sequences. The Smith- Waterman is one type of algorithm that permits gaps in 
sequence alignments. See Meth Mol Biol (1997) 70: 173-187. Also, the GAP program using 
the Needleman and Wunsch alignment method can be utilized to align sequences. An 
alternative search strategy uses MPSRCH software, which runs on a MASPAR computer. 
MPSRCH uses a Smith- Waterman algorithm to score sequences on a massively parallel 
computer. This approach improves ability to identify sequences that are distantly related 
matches, and is especially tolerant of small gaps and nucleotide sequence errors. Amino acid 
sequences encoded by the provided polynucleotides can be used to search both protein and DNA 
databases. 

Results of individual and query sequence alignments can be divided into three 
categories, high similarity, weak similarity, and no similarity. Individual alignment results 
ranging from high similarity to weak similarity provide a basis for determining polypeptide 
activity and/or structure. Parameters for categorizing individual results include: percentage of 
the alignment region length where the strongest alignment is found, percent sequence identity, 
and p value. 

The percentage of the alignment region length is calculated by counting the number of 
residues of the individual sequence found in the region of strongest alignment, e.g. , contiguous 
region of the individual sequence that contains the greatest number of residues that are identical 
to the residues of the corresponding region of the aligned query sequence. This number is 
divided by the total residue length of the query sequence to calculate a percentage. For example, 
a query sequence of 20 amino acid residues might be aligned with a 20 amino acid region of an 
individual sequence. The individual sequence might be identical to amino acid residues 5, 9-15, 
and 17-19 of the query sequence. The region of strongest alignment is thus the region stretching 
from residue 9-19, an 1 1 amino acid stretch. The percentage of the alignment region length is: 
1 1 (length of the region of strongest alignment) divided by (query sequence length) 20 or 55%. 

Percent sequence identity is calculated by counting the number of amino acid matches 
between the query and individual sequence and dividing total number of matches by the number 
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of residues of the individual sequences found in the region of strongest alignment. Thus, the 
percent identity in the example above would be 10 matches divided by 1 1 amino acids, or 
approximately, 90.9% 

P value is the probability that the alignment was produced by chance. For a single 
alignment, the p value can be calculated according to Karlin et al.Proc. Natl Acad. Scl (1990) 
57:2264 and Karlin et al, Proc. Natl Acad. Set (1993) 90. The p value of multiple alignments 
using the same query sequence can be calculated using an heuristic approach described in 
Altschul et al 9 Nat Genet (1994) 5:1 19. Alignment programs such as BLAST program can 
calculate the p value. 

Another factor to consider for determining identity or similarity is the location of the 
similarity or identity. Strong local alignment can indicate similarity even if the length of 
alignment is short. Sequence identity scattered throughout the length of the query sequence also 
can indicate a similarity between the query and profile sequences. The boundaries of the region 
where the sequences align can be determined according to Doolittle, supra; BLAST or FAST 
programs; or by determining the area where sequence identity is highest. 

High Similarity. In general, in alignment results considered to be of high similarity, the 
percent of the alignment region length is typically at least about 55% of total length query 
sequence; more typically, at least about 58%; even more typically; at least about 60% of the 
total residue length of the query sequence. Usually, percent length of the alignment region can 
be as much as about 62%; more usually, as much as about 64%; even more usually, as much as 
about 66%. Further, for high similarity, the region of alignment, typically, exhibits at least 
about 75% of sequence identity; more typically, at least about 78%; even more typically; at least 
about 80% sequence identity. Usually, percent sequence identity can be as much as about 82%; 
more usually, as much as about 84%; even more usually, as much as about 86%. 

The p value is used in conjunction with these methods. If high similarity is found, the 
query sequence is considered to have high similarity with a profile sequence when the p value is 
less than or equal to about 10" 2 ; more usually; less than or equal to about 10" 3 ; even more 
usually; less than or equal to about 10" 4 . More typically, the p value is no more than about 10" 5 ; 
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more typically; no more than or equal to about 10~ 10 ; even more typically; no more than or equal 
to about 10" 15 for the query sequence to be considered high similarity. 

Weak Similarity. In general, where alignment results considered to be of weak 
similarity, there is no minimum percent length of the alignment region nor minimum length of 
alignment. A better showing of weak similarity is considered when the region of alignment is, 
typically, at least about 15 amino acid residues in length; more typically, at least about 20; even 
more typically; at least about 25 amino acid residues in length. Usually, length of the alignment 
region can be as much as about 30 amino acid residues; more usually, as much as about 40; even 
more usually, as much as about 60 amino acid residues. Further, for weak similarity, the region 
of alignment, typically, exhibits at least about 35% of sequence identity; more typically, at least 
about 40%; even more typically; at least about 45% sequence identity. Usually, percent 
sequence identity can be as much as about 50%; more usually, as much as about 55%; even 
more usually, as much as about 60%. 

If low similarity is found, the query sequence is considered to have weak similarity with 
a profile sequence when the p value is usually less than or equal to about 10" 2 ; more usually; less 
than or equal to about 1 0" 3 ; even more usually; less than or equal to about 1 0" 4 . More typically, 
the p value is no more than about 10~ 5 ; more usually; no more than or equal to about 10" 10 ; even 
more usually; no more than or equal to about 10" 15 for the query sequence to be considered weak 
similarity. 

Similarity Determined by Sequence Identity Alone. Sequence identity alone can be used 
to determine similarity of a query sequence to an individual sequence and can indicate the 
activity of the sequence. Such an alignment, preferably, permits gaps to align sequences. 
Typi cally , the query sequence is related to the profile sequence if the sequence identity over the 
entire query sequence is at least about 1 5%; more typically, at least about 20%; even more 
typically, at least about 25%; even more typically, at least about 50%. Sequence identity alone 
as a measure of similarity is most useful when the query sequence is usually, at least 80 residues 
in length; more usually, 90 residues; even more usually, at least 95 amino acid residues in 
length. More typically, similarity can be concluded based on sequence identity alone when the 
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query sequence is preferably 100 residues in length; more preferably, 120 residues in length; 
even more preferably, 150 amino acid residues in length. 

Determining Activity from Alignments with Profile and Multiple Aligned Sequences. 
Translations of the provided polynucleotides can be aligned with amino acid profiles that define 
either protein families or common motifs. Also, translations of the provided polynucleotides 
can be aligned to multiple sequence alignments (MSA) comprising the polypeptide sequences of 
members of protein families or motifs. Similarity or identity with profile sequences or MSAs 
can be used to determine the activity of the gene products (e.g. , polypeptides) encoded by the 
provided polynucleotides or corresponding cDNA or genes. For example, sequences that show 
an identity or similarity with a chemokine profile or MSA can exhibit chemokine activities. 

Profiles can designed manually by (1) creating an MSA, which is an alignment of the 
amino acid sequence of members that belong to the family and (2) constructing a statistical 
representation of the alignment. Such methods are described, for example, in Birney et aL, 
Nucl Acid Res. (1996) 24(14): 2730-2739. MSAs of some protein families and motifs are 
publicly available. For example, http://genome.wustl.edu/Pfam/ includes MSAs of 547 different 
families and motifs. These MSAs are described also in Sonnhammer et aL, Proteins (1997) 28: 
405-420. Other sources over the world wide web include the site at http://www.embl- 
heidelberg.de/argos/ali/ali.html ; alternatively, a message can be sent to ALI@JEMBL- 
HEIDELBERG.DE for the information. A brief description of these MSAs is reported in 
Pascarella et aL, Prot Eng. (1996) 9(3) :249-25l. Techniques for building profiles from MSAs 
are described in Sonnhammer et aL, supra; Birney et aL, supra; and "Computer Methods for 
Macromolecular Sequence Analysis," Methods in Enzymology (1996) 266, Doolittle, Academic 
Press, Inc., a division of Harcourt Brace & Co., San Diego, California, USA. 

Similarity between a query sequence and a protein family or motif can be determined by 
(a) comparing the query sequence against the profile and/or (b) aligning the query sequence with 
the members of the family or motif. Typically, a program such as Searchwise is used to 
compare the query sequence to the statistical representation of the multiple alignment, also 
known as a profile. The program is described in Birney et aL, supra. Other techniques to 
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compare the sequence and profile are described in Sonnhammer et ai, supra and Doolittle, 
supra. 

Next, methods described by Feng et aL, J. Mol Evol (1 987) 25:35 1 and Higgins et aL, 
CABIOS (1989) 5:151 can be used align the query sequence with the members of a family or 
motif, also known as a MSA. Computer programs, such as PILEUP, can be used. See Feng et 
ah, infra. In general, the following factors are used to determine if a similarity between a query 
sequence and a profile or MSA exists: (1) number of conserved residues found in the query 
sequence, (2) percentage of conserved residues found in the query sequence, (3) number of 
frameshifts, and (4) spacing between conserved residues. 

Some alignment programs that both translate and align sequences can make any number 
of frameshifts when translating the nucleotide sequence to produce the best alignment. The 
fewer frameshifts needed to produce an alignment, the stronger the similarity or identity 
between the query and profile or MSAs. For example, a weak similarity resulting from no 
frameshifts can be a better indication of activity or structure of a query sequence, than a strong 
similarity resulting from two frameshifts. Preferably, three or fewer frameshifts are found in an 
alignment; more preferably two or fewer frameshifts; even more preferably, one or fewer 
frameshifts; even more preferably, no frameshifts are found in an alignment of query and profile 
or MSAs. 

Conserved residues are those amino acids found at a particular position in all or some of 
the family or motif members. For example, most chemokines contain four conserved cysteines. 
Alternatively, a position is considered conserved if only a certain class of amino acids is found 
in a particular position in all or some of the family members. For example, the N-terminal 
position can contain a positively charged amino acid, such as lysine, arginine, or histidine. 

Typically, a residue of a polypeptide is conserved when a class of amino acids or a single 
amino acid is found at a particular position in at least about 40% of all class members; more 
typically, at least about 50%; even more typically, at least about 60% of the members. Usually, 
a residue is conserved when a class or single amino acid is found in at least about 70% of the 
members of a family or motif; more usually, at least about 80%; even more usually, at least 
about 90%; even more usually, at least about 95%. 
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A residue is considered conserved when three unrelated amino acids are found at a 
particular position in the some or all of the members; more usually, two unrelated amino acids. 
These residues are conserved when the unrelated amino acids are found at particular positions in 
at least about 40% of all class member; more typically, at least about 50%; even more typically, 
at least about 60% of the members. Usually, a residue is conserved when a class or single 
amino acid is found in at least about 70% of the members of a family or motif; more usually, at 
least about 80%; even more usually, at least about 90%; even more usually, at least about 95%. 

A query sequence has similarity to a profile or MSA when the query sequence comprises 
at least about 25% of the conserved residues of the profile or MSA; more usually, at least about 
30%; even more usually; at least about 40%. Typically, the query sequence has a stronger 
similarity to a profile sequence or MSA when the query sequence comprises at least about 45% 
of the conserved residues of the profile or MSA; more typically, at least about 50%; even more 
typically; at least about 55%. 

B. Screening Polynucleotide and Amino Acid Sequences Against Protein Profiles 

The identify and function of the gene that correlates to a polynucleotide described herein 
can be determined by screening the polynucleotides or their corresponding amino acid 
sequences against profiles of protein families. Such profiles focus on common structural motifs 
among proteins of each family. Publicly available profiles are described above in Section IVA. 
Additional or alternative profiles are described below. 

In comparing a novel polynucleotide with known sequences, several alignment tools are 
available. Examples include PileUp, which creates a multiple sequence alignment, and is 
described in Feng et aL, J. MoL EvoL (1987) 25:351. Another method, GAP, uses the alignment 
method of Needleman et al. 9 J. MoL Biol (1970) 48:443. GAP is best suited for global 
alignment of sequences. A third method, BestFit, functions by inserting gaps to maximize the 
number of matches using the local homology algorithm of Smith et al.,Adv. Appl Math (1981) 
2:482. Exemplary protein profiles are provided below and in the examples. 

Chemokines. Chemokines are a family of proteins that have been implicated in 
lymphocyte trafficking, inflammatory diseases, angiogenesis, hematopoiesis, and viral infection. 
See, for example, Rollins, Blood (1997) 9(^:909-928, and Wells et al., J. Leut Biol. (1997) 
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67:545-550. U.S. Patent No. 5,605,817 discloses DNA encoding a chemokine expressed in fetal 
spleen. U.S. Patent No. 5,656,724 discloses chemokine-like proteins and methods of use. U.S. 
Patent No. 5,602,008 discloses DNA encoding a chemokine expressed by liver. 

Chemokine mutants are polypeptides having an amino acid sequence that possesses at 
5 least one amino acid substitution, addition, or deletion as compared to native chemokines. 
Fragments possess the same amino acid sequence of the native chemokines; mutants can lack 
the amino and/or carboxyl terminal sequences. Fusions are mutants, fragments, or native 
chemokines that also include amino and/or carboxyl terminal amino acid extensions. 

The number or type of the amino acid changes is not critical, nor is the length or number 
^ 10 of the amino acid deletions, or amino acid extensions that are incorporated in the chemokines as 
compared to the native chemokine amino acid sequences. A polynucleotide encoding one of 
these variant polypeptides will retain at least about 80% amino acid identity with at least one 
known chemokine. Preferably, these polypeptides will retain at least about 85% amino acid 
sequence identity, more preferably, at least about 90%; even more preferably, at least about 
95%. In addition, the variants exhibit at least 80%; preferably about 90%; more preferably 
about 95% of at least one activity exhibited by a native chemokine, which includes 
immunological, biological, receptor binding, and signal transduction functions. 

Assays for chemotaxis relating to neutrophils are described in Walz et aL, Biochem. 
Biophys. Res. Commun. (1987) 149:755, Yoshimum et aL, Proc. Natl. Acad Sci. (USA) (1987) 
20 54:9233, and Schroder et aL, J. Immunol. (1987) 7JP:3474; to lymphocytes, Larsen et aL, 
Science (1989) 243:1464, Carr et aL, Proa Natl. Acad Set (USA) (1994) P7:3652; to tumor- 
infiltrating lymphocytes, Liao et aL, J. Exp. Med (1995). 752:1301; to hematopoietic 
progenitors, Aiuti et aL, J. Exp. Med. (1997) 185: 1 1 1 ; to monocytes, Valente et ah, Biochem. 
(1988) 27:4162; and to natural killer cells, Loetscher et aL, J. Immunol. (1996) 756:322, and 
25 Allavena et aL, Eur. J. Immunol. (1994) 24:3233. 

Assays for determining the biological activity of attracting eosinophils are described in 
Dahinden et aL, J. Exp. Med (1994) 779:751, Weber et aL, J. Immunol. (1995) 154:4166, and 
Noso et aL, Biochem. Biophys. Res. Commun. (1994) 200:1470; for attracting dendritic cells, 
Sozzani et aL, J. Immunol. (1 995) 755:3292; for attracting basophils, in Dahinden et aL, J. Exp. 
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Med. (1994) 1 79:751, Alam et aL, J. Immunol (1994) 752:1298, Alam et al, J. Exp. Med. 
(1992) 7 75:781; and for activating neutrophils, Maghazaci et aL, Eur. J. Immunol (1996) 
26:315, and Taub et aL, J. Immunol (1995) 755:3877. Native chemokines can act as mitogens 
for fibroblasts, assayed as described in Mullenbach et aL, J. Biol Chem. (1986) 257:719. 

Native chemokines exhibit binding activity with a number of receptors. Description of 
such receptors and assays to detect binding are described in, for example, Murphy et aL, Science 
(1991) 253:1280; Combadiere et al, J. Biol Chem. (1995) 270:29671; Daugherty et aL, J. Exp. 
Med (1996) 753:2349; Samson et aL, Biochem. (1996) 55:3362; Raport et al, J. Biol Chem. 

(1996) 277:17161; Combadiere etal,J. Leukoc. Biol (1996) 60:147; Baba et aL, J. Biol Chem. 

(1997) 25:14893; Yosida et aL, J. Biol Chem. (1997) 272:13803; Arvannitakis et aL, Nature 
(1997) 355:347, and other assays are known in the art. 

Assays for kinase activation of chemokines are described by Yen et aL, J. Leukoc. Biol 
(1997) 57:529; Dubois etal, J. Immunol. (1996) 755:1356; Turner et aL, J. Immunol (1995) 
755:2437. Assays for inhibition of angiogenesis or cell proliferation are described in Maione et 
aL, Science (1990) 247:77. Glycosaminoglycan production can be induced by native 
chemokines, assayed as described in Castor et aL, Proc. Natl. Acad. Sci. (USA) (1983) 80:765. 
Chemokine-mediated histamine release from basophils is assayed as described in Dahinden et 
aL, J. Exp. Med. (1989) 770:1787; and White et aL, Immunol Lett. (1989) 22:151. Heparin 
binding is described in Luster et aL, J. Exp. Med. (1995) 752:219. 

Chemokines can possess dimerization activity, which can be assayed according to 
Burrows et aL, Biochem. (1994) 33:12741; and Zhang et aL, Mol Cell Biol (1995) 75:4851. 
Native chemokines can play a role in the inflammatory response of viruses. This activity can be 
assayed as described in Bleul et aL, Nature (1996) 352:829; and Oberlin et aL, Nature (1996) 
352:833. Exocytosis of monocytes can be promoted by native chemokines. The assay for such 
activity is described in Uguccioni et aL, Eur. J. Immunol (1995) 25:64. Native chemokines also 
can inhibit hematopoietic stem cell proliferation. The method for testing for such activity is 
reported in Graham et aL, Nature (1990) 344:442. 

Death Domain Proteins. Several protein families contain death domain motifs (Feinstein 
and Kimchi, TIBS Letters (1995) 20:242). Some death domain containing proteins are 
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implicated in cytotoxic intracellular signaling (Cleveland et al., Cell (1995) 57:479, Pan et al, 
Science (1997) 275:1 1 1; Duan et al, Nature (1997) 555:86-89, and Chinnaiyan et al, Science 
(1996) 274:990). U.S. Patent No. 5,563,039 describes a protein homologous to TRADD 
(Tumor Necrosis Factor Receptor- 1 Associated Death Domain containing protein), and 
modifications of the active domain of TRADD that retain the functional characteristics of the 
protein, as well as apoptosis assays for testing the function of such death domain containing 
proteins. U.S. Patent No. 5,658,883 discloses biologically active TGF-B1 peptides. U.S. Patent 
No. 5,674,734 discloses RIP, which contains a C-terminal death domain and an N-terminal 
kinase domain. 

Leukemia Inhibitory Factor flJFY An LIF profile is constructed from sequences of 
leukemia inhibitor factor, CT-1 (cardiotrophin-1), CNTF (ciliary neurotrophic factor), OSM 
(oncostatin M), and IL-6 (interleukin-6). This profile encompasses a family of secreted 
cytokines that have pleiotropic effects on many cell types including hepatocytes, osteoclasts, 
neuronal cells and cardiac myocytes, and can be used to detect additional genes encoding such 
proteins. These molecules are all structurally related and share a common co-receptor gpl30 
which mediates intracellular signal transduction by cytoplasmic tyrosine kinases such as src. 

Novel proteins related to this family are also likely to be secreted, to activate gpl 30 and 
to function in the development of a variety of cell types. Thus new members of this family 
would be candidates to be developed as growth or survival factors for the cell types that they 
stimulate. For more details on this family of cytokines, see Pennica et al, Cytokine and Growth 
Factor Reviews (1996) 7:81-91. U.S. Patent No. 5,420,247 discloses LIF receptor and fusion 
proteins. U.S. Patent No. 5,443,825 discloses human LIF. 

Angiopoietin. Angiopoietin-1 is a secreted ligand of the TIE-2 tyrosine kinase; it 
functions as an angiogenic factor critical for normal vascular development. Angiopoietin-2 is a 
natural antagonist of angiopoietin- 1 and thus functions as an anti-angiogenic factor. These two 
proteins are structurally similar and activate the same receptor (Folkman et al, Cell (1996) 
57:1 153, and Davis et al, Cell (1996) 57:1 161). The angiopoietin molecules are composed of 
two domains: a coiled-coil region and a region related to fibrinogen. The fibrinogen domain is 
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found in many molecules including ficolin and tesascin, and is well defined structurally with 
many members. 

Receptor Protein-Tvrosine Kinases. Receptor Protein-Tyrosine Kinases or RPTKs are 
described in Lindberg, Annu. Rev. Cell Biol. (1994) 70:251-337. 

Growth Fact ors: rEpidermal Growth Factor) EGF and (Fibroblast Growth Factor) FGF. 
For a discussion of growth factor superfamilies, see Growth Factors: A Practical Approach, 
(Appendix Al) (1993) McKay and Leigh, Oxford University Press, NY, 237-243. U.S. Patent 
No. 4,444,760 discloses acidic brain fibroblast growth factor, which is active in the promotion 
of cell division and wound healing. U.S. Patent No. 5,439,818 discloses DNA encoding human 
recombinant basic fibroblast growth factor, which is active in wound healing. U.S. Patent No. 
5,604,293 discloses recombinant human basic fibroblast growth factor, which is useful for 
wound healing. U.S. Patent No. 5,410,832 discloses brain-derived and recombinant acidic 
fibroblast growth factor, which act as mitogens for mesoderm and neuroectoderm-derived cells 
in culture, and promote wound healing in soft tissue, cartilaginous tissue and musculo-skeletal 
tissue. U.S. Patent No. 5,387,673 discloses biologically active fragments of FGF. 

Proteins of t he TNF Family. A profile derived from the TNF family is created by 
aligning sequences of the following TNF family members: nerve growth factor (NGF), 
lymphotoxin, Fas ligand, tumor necrosis factor (TNFa), CD40 ligand, TRAIL, ox40 ligand, 
4- IBB ligand, CD27 ligand, and CD30 ligand. The profile is designed to identify sequences of 
proteins that constitute new members or homologues of this family of proteins. U.S. Patent No. 
5,606,023 discloses mutant TNF proteins; U.S. Patent No. 5,597,899 and U.S. Patent No. 
5,486,463 disclose TNF muteins; and U.S. Patent No. 5,652,353 discloses DNA encoding TNFa 
muteins. 

Members of the TNF family of proteins have been show in vitro to multimerize, as 
described in Burrows et al. , Biochem. (1994) 53:12741 and Zhang et al, Mol. Cell. Biol. (1995) 
75:4851 and bind receptors as described in Browning et al.,J. Immunol. (1994) 747:1230, 
Androlewicz et al, J. Biol Chem.(l992) 267:2542, and Crowe et al, Science (1994) 264:707. 

In vivo, TNFs proteolytically cleave a target protein as described in Kriegel et al. , Cell 
(1988) 53:45 and Mohler et al, Nature (1994) 370:218 and demonstrate cell proliferation and 
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differentiation activity. T-cell or thymocyte proliferation is assayed as described in Armitage et 
al, Eur. J. Immunol. (1992) 22:447; Current Protocols in Immunology, ed. J.E. Coligan et al, 
3.1-3.19; Takai etal, J. Immunol. (1986) 757:3494-3500, Bertagnoli et al, J. Immunol. (1990) 
145:1706, Bertagnoli etal, J. Immunol. (1991) 133327, Bertagnoli etal., J. Immunol. (1992) 
149:3778, and Bowman et ah, J. Immunol. (1994) 152:1756. B cell proliferation and Ig 
secretion are assayed as described in Maliszewski, J. Immunol. (1990) 144:3028, and Assays for 
B Cell Function: In Vitro Antibody Production . Mond and Brunswick, Current Protocols in 
Immunol., Coligan Ed vol 1 pp 3.8.1-3.8.16, John Wiley and Sons, Toronto 1994, Kehrl et al, 
Science (1987) 235:1144 and Boussiotis etal, PNAS USA (1994) 97:7007. Other in vivo 
activities include upregulation of cell surface antigens, upregulation of costimulatory molecules, 
and cellular aggregation/adhesion as described in Barrett et al, J. Immunol (1991) 146:1722; 
Bjorck et al., Eur. J. Immunol. (1993) 23:1771; Clark et al, Annu Rev. Immunol. (1991) 9:97; 
Ranheim etal, J. Exp. Med. (1994) 777:925; Yellin, J. Immunol (1994) 753:666; and Grass et 
al, Blood (1994) 54:2305. 

Proliferation and differentiation of hematopoietic and lymphopoietic cells has also been 
shown in vivo for TNFs, using assays for embryonic differentiation and hematopoiesis as 
described in Johansson et al, Cellular Biology (1995) 75:141, Keller et al, Mol. Cell Biol. 
(1993) 73:473, McClanahan et al, Blood (1993) 57:2903 and using assays to detect stem cell 
survival and differentiation as described in Culture of Hematopoietic Cells, Freshney et al. eds, 
pp 1-21, 23-29, 139-162, 163-179, and 265-268, Wiley-Liss, Inc., New York, NY, 1994, and 
Hirajama et al, PNAS USA (1992) 59:5907. 

In vivo activities of TNFs also include lymphocyte survival and apoptosis, assayed as 
described in Darzynkewicz et al, Cytometry (1992) 73:795; Gorczca et al, Leukemia (1993) 
7:659; Itoh etal, Cell (1991) 56:233; Zacharduk, J. Immunol (1990) 145:4037; Zamai etal, 
Cytometry (1993) 74:891; and Gorczyca etal, Int'lJ. Oncol. (1992) 7:639. Some members of 
the TNF family are cleaved from the cell surface; others remain membrane bound. The three- 
dimensional structure of TNF is discussed in Sprang and Eck, Tumor Necrosis Factors; supra. 

TNF proteins include a transmembrane domain. The protein is cleaved into a shorter 
soluble version, as described in Kriegler et al, Cell (1988) 53:45, Perez et al, Cell (1990) 
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63:251, and Shaw et ai 9 Cell (1986) 46:659. The transmembrane domain is between amino acid 
46 and 77 and the cytoplasmic domain is between position 1 and 45 on the human form of 
TNFoc. The 3-dimensional motifs of TNF include a sandwich of two pleated p sheets. Each 
sheet is composed of anti-parallel p strands, p strands facing each other on opposite sites of the 
sandwich are connected by short polypeptide loops, as described in Van Ostade et al, Protein 
Engineering (1994) 7(1): 5 , and Sprang et ah, Tumor Necrosis Factors; supra. Residues of the 
TNF family proteins that are involved in the p sheet secondary structure have been identified as 
described in Van Ostade et at, Protein Eng. (1994) 7(1)15, and Sprang et ah, supra. 

TNF receptors are disclosed in U.S. Patent No. 5,395,760. A profile derived from the 
TNF receptor family is created by aligning sequences of the TNF receptor family, including 
Apol/Fas, TNFR I and II, death receptor 3 (DR3), CD40, ox40, CD27, and CD30. Thus, the 
profile is designed to identify from the polynucleotides of the invention sequences of proteins 
that constitute new members or homologues of this family of proteins. 

Tumor necrosis factor receptors exist in two forms in humans: p55 TNFR and p75 
TNFR, both of which provide intracellular signals upon binding with a ligand. The extracellular 
domains of these receptor proteins are cysteine rich. The receptors can remain membrane 
bound, although some forms of the receptors are cleaved forming soluble receptors. The 
regulation, diagnostic, prognostic, and therapeutic value of soluble TNF receptors is discussed in 
Aderka, Cytokine and Growth Factor Reviews, (1996) 7(3) :231. 

PDGF Family. U.S. Patent No. 5,326,695 discloses platelet derived growth factor 
agonists; bioactive portions of PDGF-B are used as agonists. U.S. Patent No. 4,845,075 
discloses biologically active B-chain homodimers, and also includes variants and derivatives of 
the PDGF-B chain. U.S. Patent No. 5,128,321 discloses PDGF analogs and methods of use. 
Proteins having the same bioactivity as PDGF are disclosed, including A and B chain proteins. 

Kinase (Including MKK) Family. U.S. Patent No. 5,650,501 discloses serine/threonine 
kinase, associated with mitotic and meiotic cell division; the protein has a kinase domain in its 
N-terminal and 3 PEST regions in the C-terminus. U.S. Patent No. 5,605,825 discloses human 
PAK65, a serine protein kinase. 
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The foregoing discussion provides a few examples of the protein profiles that can be 
compared with the polynucleotides of the invention. One skilled in the art can use these and 
other protein profiles to identify the genes that correlate with the provided polynucleotides. 
C. Identification of Secreted & Membrane-Bound Polypeptides 
Both secreted and membrane-bound polypeptides of the present invention are of 
particular interest. For example, levels of secreted polypeptides can be assayed in body fluids 
that are convenient, such as blood, urine, prostatic fluid and semen. Membrane-bound 
polypeptides are useful for constructing vaccine antigens or inducing an immune response. 
Such antigens would comprise all or part of the extracellular region of the membrane-bound 
polypeptides. Because both secreted and membrane-bound polypeptides comprise a fragment of 
contiguous hydrophobic amino acids, hydrophobicity predicting algorithms can be used to 

identify such polypeptides. 

A signal sequence is usually encoded by both secreted and membrane-bound polypeptide 
genes to direct a polypeptide to the surface of the cell The signal sequence usually comprises a 
stretch of hydrophobic residues. Such signal sequences can fold into helical structures. 
Membrane-bound polypeptides typically comprise at least one transmembrane region that 
possesses a stretch of hydrophobic amino acids that can transverse the membrane. Some 
transmembrane regions also exhibit a helical structure. Hydrophobic fragments within a 
polypeptide can be identified by using computer algorithms. Such algorithms include Hopp & 
Woods, Proc. Natl Acad. ScL USA (1981) 75:3824-3828; Kyte & Doolittle, 1 Mol Biol (1982) 
157: 105-132; and RAOAR algorithm, Degli Esposti et al, Eur. 1 Biochem. (1990) 190: 207- 
219. 

Another method of identifying secreted and membrane-bound polypeptides is to translate 
the polynucleotides of the invention in all six frames and determine if at least 8 contiguous 
hydrophobic amino acids are present. Those translated polypeptides with at least 8; more 
typically, 10; even more typically, 12 contiguous hydrophobic amino acids are considered to be 
either a putative secreted or membrane bound polypeptide. Hydrophobic amino acids include 
alanine, glycine, histidine, isoleucine, leucine, lysine, methionine, phenylalanine, proline, 
threonine, tryptophan, tyrosine, and valine. 
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IV. Identification of the Function of an Expression Product of a Full-Length Gene 
Corresponding to a Polynucleotide 

Ribozymes, antisense constructs, and dominant negative mutants can be used to 
determine function of the expression product of a gene corresponding to a polynucleotide 
provided herein. These methods and compositions are particularly useful where the provided 
novel polynucleotide exhibits no significant or substantial homology to a sequence encoding a 
gene of known function. Antisense molecules and ribozymes can be constructed from synthetic 
polynucleotides. Typically, the phosphoramidite method of oligonucleotide synthesis is used. 
See Beaucage etal., Tel Lett (1981) 22:1859 and U.S. Patent No. 4,668,777. Automated 
devices for synthesis are available to create oligonucleotides using this chemistry. Examples of 
such devices include Biosearch 8600, Models 392 and 394 by Applied Biosystems, a division of 
Perkin-Elmer Corp., Foster City, California, USA; and Expedite by Perceptive Biosystems, 
Framingham, Massachusetts, USA. Synthetic RNA, phosphate analog oligonucleotides, and 
chemically derivatized oligonucleotides can also be produced, and can be covalently attached to 
other molecules. RNA oligonucleotides can be synthesized, for example, using RNA 
phosphoramidites. This method can be performed on an automated synthesizer, such as Applied 
Biosystems, Models 392 and 394, Foster City, California, USA. See Applied Biosystems User 
Bulletin 53 and Ogilvie et a/., Pure & Applied Chem. (mi) 59:325. 

Phosphorothioate oligonucleotides can also be synthesized for antisense construction. A 
sulfurizing reagent, such as tetraethylthiruam disulfide (TETD) in acetonitrile can be used to 
convert the internucleotide cyanoethyl phosphite to the phosphorothioate triester within 1 5 
minutes at room temperature. TETD replaces the iodine reagent, while all other reagents used 
for standard phosphoramidite chemistry remain the same. Such a synthesis method can be 
automated using Models 392 and 394 by Applied Biosystems, for example. 

Oligonucleotides of up to 200 nucleotides can be synthesized, more typically, 100 
nucleotides, more typically 50 nucleotides; even more typically 30 to 40 nucleotides. These 
synthetic fragments can be annealed and ligated together to construct larger fragments. See, for 
example, Sambrook et al, supra. 
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A. Ribozvmes 

Trans-cleaving catalytic RNAs (ribozymes) are RNA molecules possessing 
endoribonuclease activity. Ribozymes are specifically designed for a particular target, and the 
target message must contain a specific nucleotide sequence. They are engineered to cleave any 
RNA species site-specifically in the background of cellular RNA. The cleavage event renders 
the mRNA unstable and prevents protein expression. Importantly, ribozymes can be used to 
inhibit expression of a gene of unknown function for the purpose of determining its function in 
an in vitro or in vivo context, by detecting the phenotypic effect. 

One commonly used ribozyme motif is the hammerhead, for which the substrate 
sequence requirements are minimal. Design of the hammerhead ribozyme is disclosed in 
Usman et al, Current Opin. Struct. Biol. (1996) 6:521. Usman also discusses the therapeutic 
uses of ribozymes. Ribozymes can also be prepared and used as described in Long et al, 
FASEBJ. (1993) 7:25; Symons,^(n«. Rev. Biochem. (1992) 57:641; Perrotta etal, Biochem. 
( 1 992) 31 : 1 6; Ojwang et al. , Proc. Natl. Acad. Sci. (USA) (1 992) 89: 1 0802; and U.S. Patent 
No. 5,254,678. Ribozyme cleavage of HIV-I RNA is described in U.S. Patent No. 5,144,019; 
methods of cleaving RNA using ribozymes is described in U.S. Patent No. 5,1 16,742; and 
methods for increasing the specificity of ribozymes are described in U.S. Patent No. 5,225,337 
and Koizumi et al., Nucleic Acid Res. (1989) 77:7059. Preparation and use of ribozyme 
fragments in a hammerhead structure are also described by Koizumi et al., Nucleic Acids Res. 
(1989) 77:7059. Preparation and use of ribozyme fragments in a hairpin structure are described 
by Chowrira and Burke, Nucleic Acids Res. (1992) 20:2835. Ribozymes can also be made by 
rolling transcription as described in Daubendiek and Kool, Nat. Biotechnol. (1997) 15(3):273. 

The hybridizing region of the ribozyme can be modified or can be prepared as a 
branched structure as described in Horn and Urdea, Nucleic Acids Res. (1989) 77:6959. The 
basic structure of the ribozymes can also be chemically altered in ways familiar to those skilled 
in the art, and chemically synthesized ribozymes can be administered as synthetic 
oligonucleotide derivatives modified by monomelic units. In a therapeutic context, liposome 
mediated delivery of ribozymes improves cellular uptake, as described in Birikh et al., Eur. J. 
Biochem. (1997)245:1. 
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Using the polynucleotide sequences of the invention and methods known in the art, 
ribozymes are designed to specifically bind and cut the corresponding mRNA species. 
Ribozymes thus provide a means to inhibit the expression of any of the proteins encoded by the 
disclosed polynucleotides or their full-length genes. The full-length gene need not be known in 
order to design and use specific inhibitory ribozymes. In the case of a polynucleotide or full- 
length cDNA of unknown function, ribozymes corresponding to that nucleotide sequence can be 
tested in vitro for efficacy in cleaving the target transcript. Those ribozymes that effect cleavage 
in vitro are further tested in vivo. The ribozyme can also be used to generate an animal model 
for a disease, as described in Birikh et al, supra. An effective ribozyme is used to determine 
the function of the gene of interest by blocking its transcription and detecting a change in the 
cell. Where the gene is found to be a mediator in a disease, an effective ribozyme is designed 
and delivered in a gene therapy for blocking transcription and expression of the gene. 

Therapeutic and functional genomic applications of ribozymes proceed beginning with 
knowledge of a portion of the coding sequence of the gene to be inhibited. Thus, for many 
genes, a partial polynucleotide sequence provides adequate sequence for constructing an 
effective ribozyme. A target cleavage site is selected in the target sequence, and a ribozyme is 
constructed based on the 5' and 3' nucleotide sequences that flank the cleavage site. Retroviral 
vectors are engineered to express monomeric and multimeric hammerhead ribozymes targeting 
the mRNA of the target coding sequence. These monomeric and multimeric ribozymes are 
tested in vitro for an ability to cleave the target mRNA. A cell line is stably transduced with the 
retroviral vectors expressing the ribozymes, and the transduction is confirmed by Northern blot 
analysis and reverse-transcription polymerase chain reaction (RT-PCR). The cells are screened 
for inactivation of the target mRNA by such indicators as reduction of expression of disease 
markers or reduction of the gene product of the target mRNA. 

B. Antisense 

Antisense nucleic acids are designed to specifically bind to RNA, resulting in the 
formation of RNA-DNA or RNA-RNA hybrids, with an arrest of DNA replication, reverse 
transcription or messenger RNA translation. Antisense polynucleotides based on a selected 
polynucleotide sequence can interfere with expression of the corresponding gene. Antisense 
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polynucleotides are typically generated within the cell by expression from antisense constructs 
that contain the antisense strand as the transcribed strand. Antisense polynucleotides based on 
the disclosed polynucleotides will bind and/or interfere with the translation of mRNA 
comprising a sequence complementary to the antisense polynucleotide. The expression products 
of control cells and cells treated with the antisense construct are compared to detect the protein 
product of the gene corresponding to the polynucleotide upon which the antisense construct is 
based. The protein is isolated and identified using routine biochemical methods. 

One rationale for using antisense methods to determine the function of the gene 
corresponding to a disclosed polynucleotide is the biological activity of antisense therapeutics. 
Antisense therapy for a variety of cancers is in clinical phase and has been discussed extensively 
in the literature. Reed reviewed antisense therapy directed at the Bcl-2 gene in tumors; gene 
transfer-mediated overexpression of Bcl-2 in tumor cell lines conferred resistance to many types 
of cancer drugs. (Reed, J.C. 9 N. CI (1997) £9:988). The potential for clinical development of 
antisense inhibitors of ras is discussed by Cowsert, L.M., Anti-Cancer Drug Design (1997) 
72:359. Additional important antisense targets include leukemia (Geurtz, A.M., Anti-Cancer 
Drug Design (1997) 72:341); human C-ref kinase (Monia, B.P., Anti-Cancer Drug Design 
(1997) 72:327); and protein kinase C (McGraw et al s Anti-Cancer Drug Design (1997) 72:315. 

Given the extensive background literature and clinical experience in antisense therapy, 
one skilled in the art can use selected polynucleotides of the invention as additional potential 
therapeutics. The choice of polynucleotide can be narrowed by first testing them for binding to 
"hot spot" regions of the genome of cancerous cells. If a polynucleotide is identified as binding 
to a "hot spot", testing the polynucleotide as an antisense compound in the corresponding cancer 
cells clearly is warranted. 

Ogunbiyi et al, Gastroenterology (1997) JJ3(3):76l describe prognostic use of allelic 
loss in colon cancer; Barks etaL, Genes, Chromosomes, and Cancer (1997) 19(4):21% describe 
increased chromosome copy number detected by FISH in malignant melanoma; Nishizake et ah, 
Genes, Chromosomes, and Cancer (1997) 19(4):267 describe genetic alterations in primary 
breast cancer and their metastases and direct comparison using modified comparative genome 
hybridization; and Elo et ah, Cancer Research (1997) 57(7^:3356 disclose that loss of 
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heterozygosity at 16z24.1-q24.2 is significantly associated with metastatic and aggressive 
behavior of prostate cancer. 

C. Dominant Negative Mutations 

As an alternative method for identifying function of the gene corresponding to a 
polynucleotide disclosed herein, dominant negative mutations are readily generated for 
corresponding proteins that are active as homomultimers. A mutant polypeptide will interact 
with wild-type polypeptides (made from the other allele) and form a non-functional multimer. 
Thus, a mutation is in a substrate-binding domain, a catalytic domain, or a cellular localization 
domain. Preferably, the mutant polypeptide will be overproduced. Point mutations are made 
that have such an effect. In addition, fusion of different polypeptides of various lengths to the 
terminus of a protein can yield dominant negative mutants. General strategies are available for 
making dominant negative mutants (see, e.g., Herskowitz, Nature (1987) 529:219). Such 
techniques can be used to create loss of function mutations, which are useful for determining 
protein function. 

V. Construction of Polypeptides of the Invention and Variants Thereof 

The polypeptides of the invention include those encoded by the disclosed 
polynucleotides. These polypeptides can also be encoded by nucleic acids that, by virtue of the 
degeneracy of the genetic code, are not identical in sequence to the disclosed polynucleotides. 
Thus, the invention includes within its scope a polypeptide encoded by a polynucleotide having 
the sequence of any one of SEQ ID NOS: 1-844 or a variant thereof. 

In general, the term "polypeptide" as used herein refers to both the full length 
polypeptide encoded by the recited polynucleotide, the polypeptide encoded by the gene 
represented by the recited polynucleotide, as well as portions or fragments thereof. 
"Polypeptides" also includes variants of the naturally occurring proteins, where such variants 
are homologous or substantially similar to the naturally occurring protein, and can be of an 
origin of the same or different species as the naturally occurring protein (e.g., human, murine, or 
some other species that naturally expresses the recited polypeptide, usually a mammalian 
species). In general, variant polypeptides have a sequence that has at least about 80%, usually at 
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least about 90%, and more usually at least about 98% sequence identity with a differentially 
expressed polypeptide of the invention, as measured by BLAST using the parameters described 
above. The variant polypeptides can be naturally or non-naturally glycosylated, i.e., the 
polypeptide has a glycosylation pattern that differs from the glycosylation pattern found in the 
corresponding naturally occurring protein. 

The invention also encompasses homologs of the disclosed polypeptides (or fragments 
thereof) where the homologs are isolated from other species, i.e. other animal or plant species, 
where such homologs, usually mammalian species, e.g. rodents, such as mice, rats; domestic 
animals, e.g. , horse, cow, dog, cat; and humans. By homolog is meant a polypeptide having at 
least about 35%, usually at least about 40% and more usually at least about 60% amino acid 
sequence identity a particular differentially expressed protein as identified above, where 
sequence identity is determined using the BLAST algorithm, with the parameters described 
supra. 

In general, the polypeptides of the subject invention are provided in a non-naturally 
occurring environment, e.g. are separated from their naturally occurring environment. In certain 
embodiments, the subject protein is present in a composition that is enriched for the protein as 
compared to a control. As such, purified polypeptide is provided, where by purified is meant 
that the protein is present in a composition that is substantially free of non-differentially 
expressed polypeptides, where by substantially free is meant that less than 90%, usually less 
than 60% and more usually less than 50% of the composition is made up of non-differentially 
expressed polypeptides. 

Also within the scope of the invention are variants; variants of polypeptides include 
mutants, fragments, and fusions. Mutants can include amino acid substitutions, additions or 
deletions. The amino acid substitutions can be conservative amino acid substitutions or 
substitutions to eliminate non-essential amino acids, such as to alter a glycosylation site, a 
phosphorylation site or an acetylation site, or to minimize misfolding by substitution or deletion 
of one or more cysteine residues that are not necessary for function. Conservative amino acid 
substitutions are those that preserve the general charge, hydrophobicity/hydrophilicity, and/or 
steric bulk of the amino acid substituted. For example, substitutions between the following 
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groups are conservative: Gly/Ala 5 Val/Ile/Leu, Asp/Glu, Lys/Arg, Asn/Gln, Ser/Cys, Thr, and 
Phe/Trp/Tyr. 

Variants can be designed so as to retain biological activity of a particular region of the 
protein (e.g., a functional domain and/or, where the polypeptide is a member of a protein family, 
a region associated with a consensus sequence). In a non-limiting example, Osawa et al. 9 
Biochem. Mol Int. (1994) J4:1003, discusses the actin binding region of a protein from several 
different species. The actin binding regions of the these species are considered homologous 
based on the fact that they have amino acids that fall within "homologous residue groups." 
Homologous residues are judged according to the following groups (using single letter amino 
acid designations): STAG; ILVMF; HRK; DEQN; and FYW. For example, and S, a T, an A or 
a G can be in a position and the function (in this case actin binding) is retained. 

Additional guidance on amino acid substitution is available from studies of protein 
evolution. Go et al 9 Int. J. Peptide Protein Res. (1980) 75:21 1, classified amino acid residue 
sites as interior or exterior depending on their accessibility. More frequent substitution on 
exterior sites was confirmed to be general in eight sets of homologous protein families 
regardless of their biological functions and the presence or absence of a prosthetic group. 
Virtually all types of amino acid residues had higher mutabilities on the exterior than in the 
interior. No correlation between mutability and polarity was observed of amino acid residues in 
the interior and exterior, respectively. Amino acid residues were classified into one of three 
groups depending on their polarity: polar (Arg, Lys, His, Gin, Asn, Asp, and Glu); weak polar 
(Ala, Pro, Gly, Thr, and Ser), and nonpolar (Cys, Val, Met, He, Leu, Phe, Tyr, and Trp). Amino 
acid replacements during protein evolution were very conservative: 88% and 76% of them in 
the interior or exterior, respectively, were within the same group of the three. Inter-group 
replacements are such that weak polar residues are replaced more often by nonpolar residues in 
the interior and more often by polar residues on the exterior. 

Additional guidance for production of polypeptide variants is provided in Querol et ah, 
Prot. Eng. (1996) 9:265, which provides general rules for amino acid substitutions to enhance 
protein thermostability. New glycosylation sites can be introduced as discussed in Olsen and 
Thomsen, J. Gen. Microbiol (1991) 757:579. An additional disulfide bridge can be introduced, 
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as discussed by Perry and Wetzel, Science (1984) 225:555; Pantoliano et aL, Biochemistry 
(1987) 26:2077; Matsumura et aL, Nature (1989) 342:291 ; Nishikawa et aL, Protein Eng. 
(1990) 3:443; Takagi et aL, J. Biol Chem. (1990) 26J:6874; Clarke et aL, Biochemistry (1993) 
32:4322; and Wakarchuk et aL, Protein Eng. (1994) 7:1379. Metal binding sites can be 
introduced, according to Toma et aL, Biochemistry (1991) 30:97, and Haezerbrouck et aL, 
Protein Eng. (1993) 6:643. Substitutions with prolines in loops can be made according to Masul 
etal,Appl Env. Microbiol (1994) 60:3579; and Hardy et aL, FEBSLett. 377:89. 

Cysteine-depleted muteins are considered variants within the scope of the invention. 
These variants can be constructed according to methods disclosed in U.S. Patent No. 4,959,314, 
which discloses substitution of cysteines with other amino acids, and methods for assaying 
biological activity and effect of the substitution. Such methods are suitable for proteins 
according to this invention that have cysteine residues suitable for such substitutions, for 
example to eliminate disulfide bond formation. 

Variants also include fragments of the polypeptides disclosed herein, particularly 
biologically active fragments and/or fragments corresponding to functional domains. 
Fragments of interest will typically be at least about 1 0 aa to at least about 1 5 aa in length, 
usually at least about 50 aa in length, and can be as long as 300 aa in length or longer, but will 
usually not exceed about 1 000 aa in length, where the fragment will have a stretch of amino 
acids that is identical to a polypeptide encoded by a polynucleotide having a sequence of any 
SEQ ID NOS: 1-844, or a homolog thereof. 

The protein variants described herein are encoded by polynucleotides that are within the 
scope of the invention. The genetic code can be used to select the appropriate codons to 
construct the corresponding variants. 

VI. Computer-Related Embodiments 

In general, a library of polynucleotides is a collection of sequence information, which 
information is provided in either biochemical form (e.g., as a collection of polynucleotide 
molecules), or in electronic form (e.g., as a collection of polynucleotide sequences stored in a 
computer-readable form, as in a computer system and/or as part of a computer program). The 
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sequence information of the polynucleotides can be used in a variety of ways, e.g., as a resource 
for gene discovery, as a representation of sequences expressed in a selected cell type (e.g., cell 
type markers), and/or as markers of a given disease or disease state. In general, a disease marker 
is a representation of a gene product that is present in all affected by disease either at an 
increased or decreased level relative to a normal cell (e.g., a cell of the same or similar type that 
is not substantially affected by disease). For example, a polynucleotide sequence in a library 
can be a polynucleotide that represents an mRNA, polypeptide, or other gene product encoded 
by the polynucleotide, that is either overexpressed or underexpressed in a breast ductal cell 
affected by cancer relative to a normal (i.e., substantially disease-free) breast cell. 

The nucleotide sequence information of the library can be embodied in any suitable 
form, e.g., electronic or biochemical forms. For example, a library of sequence information 
embodied in electronic form includes an accessible computer data file (or, in biochemical form, 
a collection of nucleic acid molecules) that contains the representative nucleotide sequences of 
genes that are differentially expressed (e.g., overexpressed or underexpressed) as between, for 
example, i) a cancerous cell and a normal cell; ii) a cancerous cell and a dysplastic cell; iii) a 
cancerous cell and a cell affected by a disease or condition other than cancer; iv) a metastatic 
cancerous cell and a normal cell and/or non-metastatic cancerous cell; v) a malignant cancerous 
cell and a non-malignant cancerous cell (or a normal cell) and/or vi) a dysplastic cell relative to 
a normal cell. Other combinations and comparisons of cells affected by various diseases or 
stages of disease will be readily apparent to the ordinarily skilled artisan. Biochemical 
embodiments of the library include a collection of nucleic acids that have the sequences of the 
genes in the library, where the nucleic acids can correspond to the entire gene in the library or to 
a fragment thereof, as described in greater detail below. 

The polynucleotide libraries of the subject invention include sequence information of a 
plurality of polynucleotide sequences, where at least one of the polynucleotides has a sequence 
of any of SEQ ID NOS: 1-844. By plurality is meant at least 2, usually at least 3 and can include 
up to all of SEQ ID NOS:l-844. The length and number of polynucleotides in the library will 
vary with the nature of the library, e.g. , if the library is an oligonucleotide array, a cDNA array, 
a computer database of the sequence information, etc. 
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Where the library is an electronic library, the nucleic acid sequence information can be 
present in a variety of media. "Media" refers to a manufacture, other than an isolated nucleic 
acid molecule, that contains the sequence information of the present invention. Such a 
manufacture provides the genome sequence or a subset thereof in a form that can be examined 
by means not directly applicable to the sequence as it exists in a nucleic acid. For example, the 
nucleotide sequence of the present invention, e.g. the nucleic acid sequences of any of the 
polynucleotides of SEQ ID NOS: 1-844, can be recorded on computer readable media, e.g. any 
medium that can be read and accessed directly by a computer. Such media include, but are not 
limited to: magnetic storage media, such as a floppy disc, a hard disc storage medium, and a 
magnetic tape; optical storage media such as CD-ROM; electrical storage media such as RAM 
and ROM; and hybrids of these categories such as magnetic/optical storage media. One of skill 
in the art can readily appreciate how any of the presently known computer readable mediums 
can be used to create a manufacture comprising a recording of the present sequence information. 
"Recorded" refers to a process for storing information on computer readable medium, using any 
such methods as known in the art. Any convenient data storage structure can be chosen, based 
on the means used to access the stored information. A variety of data processor programs and 
formats can be used for storage, e.g. word processing text file, database format, eta In addition 
to the sequence information, electronic versions of the libraries of the invention can be provided 
in conjunction or connection with other computer-readable information and/or other types of 
computer-readable files (e.g., searchable files, executable files, etc, including, but not limited to, 
for example, search program software, etc.). 

By providing the nucleotide sequence in computer readable form, the information can be 
accessed for a variety of purposes. Computer software to access sequence information is 
publicly available. For example, the BLAST (Altschul et ah, supra) and BLAZE (Brutlag et al 
Comp. Client. (1993) 17:203) search algorithms on a Sybase system can be used identify open 
reading frames (ORFs) within the genome that contain homology to ORFs from other 
organisms. 

As used herein, "a computer-based system" refers to the hardware means, software 
means, and data storage means used to analyze the nucleotide sequence information of the 
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present invention. The minimum hardware of the computer-based systems of the present 
invention comprises a central processing unit (CPU), input means, output means, and data 
storage means. A skilled artisan can readily appreciate that any one of the currently available 
computer-based system are suitable for use in the present invention. The data storage means can 
comprise any manufacture comprising a recording of the present sequence information as 
described above, or a memory access means that can access such a manufacture. 

"Search means" refers to one or more programs implemented on the computer-based 
system, to compare a target sequence or target structural motif with the stored sequence 
information. Search means are used to identify fragments or regions of the genome that match a 
particular target sequence or target motif. A variety of known algorithms are publicly known 
and commercially available, e.g. MacPattern (EMBL), BLASTN and BLASTX (NCBI). A 
"target sequence" can be any DNA or amino acid sequence of six or more nucleotides or two or 
more amino acids, preferably from about 10 to 100 amino acids or from about 30 to 300 
nucleotide residues. 

A "target structural motif," or "target motif," refers to any rationally selected sequence or 
combination of sequences in which the sequence(s) are chosen based on a three-dimensional 
configuration that is formed upon the folding of the target motif, or on consensus sequences of 
regulatory or active sites. There are a variety of target motifs known in the art. Protein target 
motifs include, but arc not limited to, enzyme active sites and signal sequences. Nucleic acid 
target motifs include, but are not limited to, hairpin structures, promoter sequences and other 
expression elements such as binding sites for transcription factors. 

A variety of structural formats for the input and output means can be used to input and 
output the information in the computer-based systems of the present invention. One format for 
an output means ranks fragments of the genome possessing varying degrees of homology to a 
target sequence or target motif. Such presentation provides a skilled artisan with a ranking of 
sequences and identifies the degree of sequence similarity contained in the identified fragment. 

A variety of comparing means can be used to compare a target sequence or target motif 
with the data storage means to identify sequence fragments of the genome. A skilled artisan can 
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readily recognize that any one of the publicly available homology search programs can be used 
as the search means for the computer based systems of the present invention. 

As discussed above, the "library" of the invention also encompasses biochemical 
libraries of the polynucleotides of SEQ ID NOS: 1-844, e.g., collections of nucleic acids 
representing the provided polynucleotides. The biochemical libraries can take a variety of 
forms, e.g., a solution of cDNAs, a pattern of probe nucleic acids stably associated with a 
surface of a solid support (i.e., an array) and the like. Of particular interest are nucleic acid 
arrays in which one or more of SEQ ID NOS: 1 -844 is represented on the array. By array is 
meant a an article of manufacture that has at least a substrate with at least two distinct nucleic 
acid targets on one of its surfaces, where the number of distinct nucleic acids can be 
considerably higher, typically being at least 10 nt, usually at least 20 nt and often at least 25 nt. 
A variety of different array formats have been developed and are known to those of skill in the 
art, including those described in 5,242,974; 5,384,261; 5,405,783; 5,412,087; 5,424,186; 
5,429,807; 5,436,327; 5,445,934; 5,472,672; 5,527,681; 5,529,756; 5,545,531; 5,554,501; 
5,556,752; 5,561,071; 5,599,895; 5,624,711; 5,639,603; 5,658,734; WO 93/17126; WO 
95/11995; WO 95/35505; EP 742287; and EP 799897. The arrays of the subject invention find 
use in a variety of applications, including gene expression analysis, drug screening, mutation 
analysis and the like, as disclosed in the above-listed exemplary patent documents. 

In addition to the above nucleic acid libraries, analogous libraries of polypeptides are 
also provided, where the where the polypeptides of the library will represent at least a portion of 
the polypeptides encoded by SEQ ID NOS: 1-844. 

VIL Utilities 

A. Use of Polynucleotide Probes in Mapping, and in Tissue Profiling 
Polynucleotide probes, generally comprising at least 12 contiguous nucleotides of a 
polynucleotide as shown in the Sequence Listing, are used for a variety of purposes, such as 
chromosome mapping of the polynucleotide and detection of transcription levels. Additional 
disclosure about preferred regions of the disclosed polynucleotide sequences is found in the 
Examples. A probe that hybridizes specifically to a polynucleotide disclosed herein should 
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provide a detection signal at least 5-, 10-, or 20-fold higher than the background hybridization 
provided with other unrelated sequences. 

Probes in Detection of E xpression Levels. Nucleotide probes are used to detect 
expression of a gene corresponding to the provided polynucleotide. The references describe an 
example of a sandwich nucleotide hybridization assay. For example, in Northern blots, mRNA 
is separated electroplioretically and contacted with a probe. A probe is detected as hybridizing 
to an mRNA species of a particular size. The amount of hybridization is quantitated to 
determine relative amounts of expression, for example under a particular condition. Probes are 
also used to detect products of amplification by polymerase chain reaction. The products of the 
reaction are hybridized to the probe and hybrids are detected. Probes are used for in situ 
hybridization to cells to detect expression. Probes can also be used in vivo for diagnostic 
detection of hybridizing sequences. Probes are typically labeled with a radioactive isotope. 
Other types of detectable labels can be used such as chromophores, fluors, and enzymes. Other 
examples of nucleotide hybridization assays are described in WO92/02526 and U.S. Patent No. 
5,124,246. 

Alternatively, the Polymerase Chain Reaction (PCR) is another means for detecting 
small amounts of target nucleic acids (see, e.g., Mullis et al, Meth. Enzymol. (1987) 755:335; 
U.S. Patent No. 4,683,195; and U.S. Patent No. 4,683,202). Two primer polynucleotides 
nucleotides hybridize with the target nucleic acids and are used to prime the reaction. The 
primers can be composed of sequence within or 3' and 5' to the polynucleotides of the Sequence 
Listing. Alternatively, if the primers are 3' and 5* to these polynucleotides, they need not 
hybridize to them or the complements. A thermostable polymerase creates copies of target 
nucleic acids from the primers using the original target nucleic acids as a template. After a large 
amount of target nucleic acids is generated by the polymerase, it is detected by methods such as 
Southern blots. When using the Southern blot method, the labeled probe will hybridize to a 
polynucleotide of the Sequence Listing or complement. 

Furthermore, mRNA or cDNA can be detected by traditional blotting techniques 
described in Sambrook et al, "Molecular Cloning: A Laboratory Manual" (New York, Cold 
Spring Harbor Laboratory, 1 989). mRNA or cDNA generated from mRNA using a polymerase 
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enzyme can be purified and separated using gel electrophoresis. The nucleic acids on the gel are 
then blotted onto a solid support, such as nitrocellulose. The solid support is exposed to a 
labeled probe and then washed to remove any unhybridized probe. Next, the duplexes 
containing the labeled probe are detected. Typically, the probe is labeled with radioactivity. 

Mapping. Polynucleotides of the present invention are used to identify a chromosome 
on which the corresponding gene resides. Such mapping can be useful in identifying the 
function of the polynucleotide-related gene by its proximity to other genes with known function. 
Function can also be assigned to the polynucleotide-related gene when particular syndromes or 
diseases map to the same chromosome. For example, use of polynucleotide probes in 
identification and quantification of nucleic acid sequence aberrations is described in U.S. Patent 
No. 5,783,387. 

For example, fluorescence in situ hybridization (FISH) on normal metaphase spreads 
facilitates comparative genomic hybridization to allow total genome assessment of changes in 
relative copy number of DNA sequences. See Schwartz and Samad, Curr. Opin. Biotechnol 
(1994) 5:70; Kallioniemi etal, Sem. Cancer Biol. (1993) 4:41; Valdes et aL, Methods in 
Molecular Biology (1997) 68: 1, Boultwood, ed., Human Press, Totowa, NJ. Preparations of 
human metaphase chromosomes are prepared using standard cytogenetic techniques from 
human primary tissues or cell lines. Nucleotide probes comprising at least 12 contiguous 
nucleotides selected from the nucleotide sequence shown in the Sequence Listing are used to 
identify the corresponding chromosome. The nucleotide probes are labeled, for example, with a 
radioactive, fluorescent, biotinylated, or chemiluminescent label, and detected by well known 
methods appropriate for the particular label selected. Protocols for hybridizing nucleotide 
probes to preparations of metaphase chromosomes are also well known in the art. A nucleotide 
probe will hybridize specifically to nucleotide sequences in the chromosome preparations that 
are complementary to the nucleotide sequence of the probe. 

Polynucleotides are mapped to particular chromosomes using, for example, radiation 
hybrids or chromosome-specific hybrid panels. See Leach et at, Advances in Genetics, (1995) 
33:63-99; Walter et aL, Nature Genetics (1994) 7:22; Walter and Goodfellow, Trends in 
Genetics (1992) P:352. Panels for radiation hybrid mapping are available from Research 
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Genetics, Inc., Huntsville, Alabama, USA. Databases for markers using various panels are 
available via the world wide web at http:/F/shgc-www.stanford.edu; and http://www- 
genome.wi.mit.edu/cgi-bin/contig/rhmapper.pl . The statistical program RHMAP can be used to 
construct a map based on the data from radiation hybridization with a measure of the relative 
5 likelihood of one order versus another. RHMAP is available via the world wide web at 
http://www.sph.umich.edu/group/statgen/software. 

In addition, commercial programs are available for identifying regions of chromosomes 
commonly associated with disease, such as cancer. Polynucleotides based on the 
polynucleotides of the invention can be used to probe these regions. For example, if through 
, , 10 profile searching a provided polynucleotide is identified as corresponding to a gene encoding a 

5*f kinase, its ability to bind to a cancer-related chromosomal region will suggest its role as a kinase 

u 

in one or more stages of tumor cell development/growth. Although some experimentation 

s » S 

HI would be required to elucidate the role, the polynucleotide constitutes a new material for 

| pi 

y isolating a specific protein that has potential for developing a cancer diagnostic or therapeutic. 
* 15 Tissue Typing or Profiling. Expression of specific mRNA corresponding to the provided 

pi polynucleotides can vary in different cell types and can be tissue-specific. This variation of 

K mRNA levels in different cell types can be exploited with nucleic acid probe assays to determine 

y i 

O tissue types. For example, PCR, branched DNA probe assays, or blotting techniques utilizing 

iff 

nucleic acid probes substantially identical or complementary to polynucleotides listed in the 
20 Sequence Listing can determine the presence or absence of the corresponding cDNA or mRNA. 

For example, a metastatic lesion is identified by its developmental organ or tissue source 
by identifying the expression of a particular marker of that organ or tissue. If a polynucleotide 
is expressed only in a specific tissue type, and a metastatic lesion is found to express that 
polynucleotide, then the developmental source of the lesion has been identified. Expression of a 
25 particular polynucleotide is assayed by detection of either the corresponding mRNA or the 
protein product. Immunological methods, such as antibody staining, are used to detect a 
particular protein product. Hybridization methods can be used to detect particular mRNA 
species, including but not limited to in situ hybridization and Northern blotting. 
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Use of Polymorphisms. A polynucleotide of the invention will be useful in forensics, 
genetic analysis, mapping, and diagnostic applications if the corresponding region of a gene is 
polymorphic in the human population. Particular polymorphic forms of the provided 
polynucleotides can be used to either identify a sample as deriving from a suspect or rule out the 
possibility that the sample derives from the suspect. Any means for detecting a polymorphism 
in a gene are used, including but not limited to electrophoresis of protein polymorphic variants, 
differential sensitivity to restriction enzyme cleavage, and hybridization to allele-specific 
probes, 

B. Antibody Production 

Expression products of a polynucleotide of the invention, the corresponding mRNA or 
cDNA, or the corresponding complete gene are prepared and used for raising antibodies for 
experimental, diagnostic, and therapeutic purposes. For polynucleotides to which a 
corresponding gene has not been assigned, this provides an additional method of identifying the 
corresponding gene. The polynucleotide or related cDNA is expressed as described above, and 
antibodies are prepared. These antibodies are specific to an epitope on the polypeptide encoded 
by the polynucleotide, and can precipitate or bind to the corresponding native protein in a cell or 
tissue preparation or in a cell-free extract of an in vitro expression system. 

Immunogens for raising antibodies are prepared by mixing the polypeptides encoded by 
the polynucleotides of the present invention with adjuvants. Alternatively, polypeptides are 
made as fusion proteins to larger immunogenic proteins. Polypeptides are also covalently linked 
to other larger immunogenic proteins, such as keyhole limpet hemocyanin. Immunogens are 
typically administered intradermally, subcutaneously, or intramuscularly. Immunogens are 
administered to experimental animals such as rabbits, sheep, and mice, to generate antibodies. 
Optionally, the animal spleen cells are isolated and fused with myeloma cells to form 
hybridomas which secrete monoclonal antibodies. Such methods are well known in the art. 
According to another method known in the art, the selected polynucleotide is administered 
directly, such as by intramuscular injection, and expressed in vivo. The expressed protein 
generates a variety of protein-specific immune responses, including production of antibodies, 
comparable to administration of the protein. 
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Preparations of polyclonal and monoclonal antibodies specific for polypeptides encoded 
by a selected polynucleotide are made using standard methods known in the art. The antibodies 
specifically bind to epitopes present in the polypeptides encoded by polynucleotides disclosed in 
the Sequence Listing. Typically, at least 6, 8, 10, or 12 contiguous amino acids are required to 
form an epitope. However, epitopes which involve non-contiguous amino acids may require 
more, for example at least 15, 25, or 50 amino acids. A short sequence of a polynucleotide may 

then be unsuitable for use as an epitope to raise antibodies for identifying the corresponding 
novel protein, because of the potential for cross-reactivity with a known protein. However, the 
antibodies can be useful for other purposes, particularly if they identify common structural 
features of a known protein and a novel polypeptide encoded by a polynucleotide of the 
invention. 

Antibodies that specifically bind to human polypeptides encoded by the provided 
polypeptides should provide a detection signal at least 5-, 10-, or 20-fold higher than a detection 
signal provided with other proteins when used in Western blots or other immunochemical 
assays. Preferably, antibodies that specifically polypeptides of the invention do not bind to 
other proteins in immunochemical assays at detectable levels and can immunoprecipitate the 
specific polypeptide from solution. 

To test for the presence of serum antibodies to the polypeptide of the invention in a 
human population, human antibodies are purified by methods well known in the art. Preferably, 
the antibodies are affinity purified by passing antiserum over a column to which the 
corresponding selected polypeptide or fusion protein is bound. The bound antibodies can then 
be eluted from the column, for example using a buffer with a high salt concentration. 

In addition to the antibodies discussed above, genetically engineered antibody 
derivatives are made, such as single chain antibodies, according to methods well known in the 
art. 
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C. Use of Polynucleotides to Construct Arrays for Diagnostics 
Polynucleotide arrays provide a high throughput technique that can assay a large number 
of polynucleotide sequences in a sample. This technology can be used as a diagnostic and as a 
tool to test for differential expression to determine function of an encoded protein. Arrays can 
be created by spotting polynucleotide probes onto a substrate (e.g., glass, nitrocelllose, etc.) in a 
two-dimensional matrix or array having bound probes. The probes can be bound to the substrate 
by either covalent bonds or by non-specific interactions, such as hydrophobic interactions. 
Samples of polynucleotides can be detectably labeled (e.g., using radioactive or fluorescent 
labels) and then hybridized to the probes. Double stranded polynucleotides, comprising the 
labeled sample polynucleotides bound to probe polynucleotides, can be detected once the 
unbound portion of the sample is washed away. Techniques for constructing arrays and 
methods of using these arrays are described in EP No. 0 799 897; PCT No. WO 97/29212; PCT 
No. WO 97/273 17; EP No. 0 785 280; PCT No. WO 97/02357; U.S. Pat. No. 5,593,839; U.S. 
Pat. No. 5,578,832; EP No. 0 728 520; U.S. Pat. No. 5,599,695; EP No. 0 721 016; U.S. Pat. 
No. 5,556,752; PCT No. WO 95/22058; and U.S. Pat. No. 5,63 1,734. 

As discussed in some detail above, arrays can be used to examine differential expression 
of genes and can be used to determine gene function. For example, arrays of the instant 
polynucleotide sequences can be used to determine if any of the provided polynucleotides are 
differentially expressed between a test cell and control cell (e.g., cancer cells and normal cells). 
For example, high expression of a particular message in a cancer cell, which is not observed in a 
corresponding normal cell, can indicate a cancer specific protein. Exemplary uses of arrays are 
further described in, for example, Pappalarado et al, Sent. Radiation Oncol. (1998) 5:217; and 
Ramsay Nature Biotechnol. (1998) 16A0. 
D. Differential Expression 

The polynucleotides of the invention can also be used to detect differences in expression 
levels between two cells, e.g., as a method to identify abnormal or diseased tissue in a human. 
For polynucleotides corresponding to profiles of protein families as described above, the choice 
of tissue can be selected according to the putative biological function. In general, the expression 
of a gene corresponding to a specific polynucleotide is compared between a first tissue that is 
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suspected of being diseased and a second, normal tissue of the human. The tissue suspected of 
being abnormal or diseased can be derived from a different tissue type of the human, but 
preferably it is derived from the same tissue type; for example an intestinal polyp or other 
abnormal growth should be compared with normal intestinal tissue. The normal tissue can be 
the same tissue as that of the test sample, or any normal tissue of the patient, especially those 
that express the polynucleotide-related gene of interest (e.g. , brain, thymus, testis, heart, 
prostate, placenta, spleen, small intestine, skeletal muscle, pancreas, and the mucosal lining of 
the colon). A difference between the polynucleotide-related gene, mRNA, or protein in the two 
tissues which are compared, for example in molecular weight, amino acid or nucleotide 
sequence, or relative abundance, indicates a change in the gene, or a gene which regulates it, in 
the tissue of the human that was suspected of being diseased. Examples of detection of 
differential expression and its use in diagnosis of cancer are described in U.S. Patent Nos. 
5,688,641 and 5,677,125. 

The polynucleotide-related genes in the two tissues are compared by any means known 
in the art. For example, the two genes can be sequenced, and the sequence of the gene in the 
tissue suspected of being diseased compared with the gene sequence in the normal tissue. The 
genes corresponding to a provided polynucleotide, or portions thereof, in the two tissues are 
amplified, for example using nucleotide primers based on the nucleotide sequence shown in the 
Sequence Listing, using the polymerase chain reaction. The amplified genes or portions of 
genes are hybridized to detectably labeled nucleotide probes selected from a nucleotide 
sequence shown in the Sequence Listing. A difference in the nucleotide sequence of the isolated 
gene in the tissue suspected of being diseased compared with the normal nucleotide sequence 
suggests a role of the gene product encoded by the subject polynucleotide in the disease, and 
provides guidance for preparing a therapeutic agent. 

Alternatively, mRNA corresponding to a provided polynucleotide in the two tissues is 
compared. PolyA + RNA is isolated from the two tissues as is known in the art. For example, 
one of skill in the art can readily determine differences in the size or amount of mRNA 
transcripts between the two tissues using Northern blots and detectably labeled nucleotide 
probes selected from the nucleotide sequence shown in the Sequence Listing. Increased or 
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decreased expression of a given mRNA in a tissue sample suspected of being diseased, 
compared with the expression of the same mRNA in a normal tissue, suggests that the expressed 
protein has a role in the disease, and also provides a lead for preparing a therapeutic agent. 

The comparison can also be accomplished by analyzing polypeptides between the 
matched samples. The sizes of the proteins in the two tissues are compared, for example, using 
antibodies of the present invention to detect polypeptides in Western blots of protein extracts 
from the two tissues. Other changes, such as expression levels and subcellular localization, can 
also be detected immunologically, using antibodies to the corresponding protein. A higher or 
lower level of expression of a given polypeptide in a tissue suspected of being diseased, 
compared with the same protein expression level in a normal tissue, is indicative that the 
expressed protein has a role in the disease, and provides guidance for preparing a therapeutic 
agent. 

Similarly, comparison of polynucleotide sequences or of gene expression products, e.g. , 
mRNA and protein, between a human tissue that is suspected of being diseased and a normal 
tissue of a human, are used to follow disease progression or remission in the human. Such 
comparisons are made as described above. For example, increased or decreased expression of a 
gene corresponding to an inventive polynucleotide in the tissue suspected of being neoplastic 
can indicate the presence of neoplastic cells in the tissue. The degree of increased expression of 
a given gene in the neoplastic tissue relative to expression of the same gene in normal tissue, or 
differences in the amount of increased expression of a given gene in the neoplastic tissue over 
time, is used to assess the progression of the neoplasia in that tissue or to monitor the response 
of the neoplastic tissue to a therapeutic protocol over time. 

The expression pattern of any two cell types can be compared, such as low and high 
metastatic tumor cell lines, malignant or non-malignant cells, or cells from tissue which have 
and have not been exposed to a therapeutic agent. A genetic predisposition to disease in a 
human is detected by comparing expression levels of an mRNA or protein corresponding to a 
polynucleotide of the invention in a fetal tissue with levels associated in normal fetal tissue. 
Fetal tissues that are used for this purpose include, but are not limited to, amniotic fluid, 
chorionic villi, blood, and the blastomere of an in vitro-fertilized embryo. The comparable 
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normal polynucleotide-related gene is obtained from any tissue. The mRNA or protein is 
obtained from a normal tissue of a human in which the polynucleotide-related gene is expressed. 
Differences such as alterations in the nucleotide sequence or size of the same product of the 
fetal polynucleotide-related gene or mRNA, or alterations in the molecular weight, amino acid 
sequence, or relative abundance of fetal protein, can indicate a germline mutation in the 
polynucleotide-related gene of the fetus, which indicates a genetic predisposition to disease. 
Particular diagnostic and prognostic uses of the disclosed polynucleotides are described in more 
detail below. 

E. Diagnostic, Prognostic, and Other Uses Based On Differential Expression 
In general, diagnostic methods of the invention for involve detection of a level or amount 
of a gene product, particularly a differentially expressed gene product, in a test sample obtained 
from a patient suspected of having or being susceptible to a disease {e.g., breast cancer, lung 
cancer, colon cancer and/or metastatic forms thereof), and comparing the detected levels to those 
levels found in normal cells {e.g., cells substantially unaffected by cancer) and/or other control 
cells {e.g., to differentiate a cancerous cell from a cell affected by dysplasia). Furthermore, the 
severity of the disease can be assessed by comparing the detected levels of a differentially 
expressed gene product with those levels detected in samples representing the levels of 
differentially gene product associated with varying degrees of severity of disease. 

The term "differentially expressed gene" is intended to encompass a polynucleotide that 
can, for example, include an open reading frame encoding a gene product {e.g., a polypeptide), 
and/or introns of such genes and adjacent 5' and 3' non-coding nucleotide sequences involved in 
the regulation of expression, up to about 20 kb beyond the coding region, but possibly further in 
either direction. The gene can be introduced into an appropriate vector for extrachromosomal 
maintenance or for integration into a host genome. In general, a difference in expression level 
associated with a decrease in expression level of at least about 25%, usually at least about 50% 
to 75%, more usually at least about 90% or more is indicative of a differentially expressed gene 
of interest, I e. , a gene that is underexpressed or down-regulated in the test sample relative to a 
control sample. Furthermore, a difference in expression level associated with an increase in 
expression of at least about 25%, usually at least about 50% to 75%, more usually at least about 
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90% and can be at least about 1 '/2-fold, usually at least about 2-fold to about 1 0-fold, and can be 
about 100-fold to about 1,000-fold increase relative to a control sample is indicative of a 
differentially expressed gene of interest, i.e., an overexpressed or up-regulated gene. 

"Differentially expressed polynucleotide" as used herein means a nucleic acid molecule 
(RNA or DNA) having a sequence that represents a differentially expressed gene, e.g. , the 
differentially expressed polynucleotide comprises a sequence (e.g., an open reading frame 
encoding a gene product) that uniquely identifies a differentially expressed gene so that 
detection of the differentially expressed polynucleotide in a sample is correlated with the 
presence of a differentially expressed gene in a sample. "Differentially expressed 
polynucleotides" is also meant to encompass fragments of the disclosed polynucleotides, e.g., 
fragments retaining biological activity, as well as nucleic acids homologous, substantially 
similar, or substantially identical (e.g., having about 90% sequence identity) to the disclosed 
polynucleotides. 

Methods of the subject invention useful in diagnosis or prognosis typically involve 
comparison of the abundance of a selected differentially expressed gene product in a sample of 
interest with that of a control to determine any relative differences in the expression of the gene 
product, where the difference can be measured qualitatively and/or quantitatively. Quantitation 
can be accomplished, for example, by comparing the level of expression product detected in the 
sample with the amounts of product present in a standard curve. A comparison can be made 
visually; by using a technique such as densitometry, with or without computerized assistance; by 
preparing a representative library of cDNA clones of mRNA isolated from a test sample, 
sequencing the clones in the library to determine that number of cDNA clones corresponding to 
the same gene product, and analyzing the number of clones corresponding to that same gene 
product relative to the number of clones of the same gene product in a control sample; or by 
using an array to detect relative levels of hybridization to a selected sequence or set of 
sequences, and comparing the hybridization pattern to that of a control. The differences in 
expression are then correlated with the presence or absence of an abnormal expression pattern. 
A variety of different methods for determining the nucleic acid abundance in a sample are 
known to those of skill in the art, where particular methods of interest include those described 
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in: Pietu et al Genome Res. (1996) 6:492; Zhao et al, Gene (1995) 156:207; Soares , Curr. 
Opin. Biotechnol (1977) 8: 542; Raval, J. Pharmacol Toxicol Methods (1994) 52:125; 
Chalifour et al, Anal Biochem (1994) 2itf:299; Stolz et aL, Mol Biotechnol (1996) 6:225; 
Hong et aL, Biosci. Reports (1982) 2:907; and McGraw, Anal Biochem. (1984) 743:298. Also 
5 of interest are the methods disclosed in WO 97/273 1 7, the disclosure of which is herein 
incorporated by reference. 

In general, diagnostic assays of the invention involve detection of a gene product of a the 
polynucleotide sequence {e.g., mRNA or polypeptide) that corresponds to a sequence of SEQ ID 
NOS: 1-844. The patient from whom the sample is obtained can be apparently healthy, 
10 susceptible to disease (e.g., as determined by family history or exposure to certain 

— *• 

j;J environmental factors), or can already be identified as having a condition in which altered 
^1 expression of a gene product of the invention is implicated. 

s f< : 

««• ■ 

HI In the assays of the invention, the diagnosis can be determined based on detected gene 

ill product expression levels of a gene product encoded by at least one, preferably at least two or 

%^ 1 5 more, at least 3 or more, or at least 4 or more of the polynucleotides having a sequence set forth 
flf in SEQ ID NOS: 1-844, and can involve detection of expression of genes corresponding to all of 

lists; 

In SEQ ID NOS: 1-844 and/or additional sequences that can serve as additional diagnostic markers 
If: and/or reference sequences. Where the diagnostic method is designed to detect the presence or 
susceptibility of a patient to cancer, the assay preferably involves detection of a gene product 
20 encoded by a gene corresponding to a polynucleotide that is differentially expressed in cancer. 
For example, a higher level of expression of a polynucleotide corresponding to SEQ ID NO:52 
relative to a level associated with a normal sample can indicate the presence of cancer in the 
patient from whom the sample is derived. In another example, detection of a lower level of a 
polynucleotide corresponding to SEQ ID NO:39 relative to a normal level is indicative of the 
25 presence of cancer in the patient. Further examples of such differentially expressed 

polynucleotides are described in the Examples below. Given the provided polynucleotides and 
information regarding their relative expression levels provided herein, assays using such 
polynucleotides and detection of their expression levels in diagnosis and prognosis will be 
readily apparent to the ordinarily skilled artisan. 
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Any of a variety of detectable labels can be used in connection with the various 
embodiments of the diagnostic methods of the invention. Suitable detectable labels include 
fluorochromes,(e.g. fluorescein isothiocyanate (FITC), rhodamine, Texas Red, phycoerythrin, 
allophycocyanin, 6-carboxyfluorescein (6-FAM), 2\7'-dimethoxy-4',5'-dichloro-6- 
carboxyfluorescein (JOE), 6-carboxy-X-rhodamine (ROX), 6-carboxy-2\4',7',4,7- 
hexachlorofluorescein (HEX), 5-carboxyfluorescein (5-FAM) or N,N,N',N'-tetramethyl-6- 
earboxyrhodamine (TAMRA)), radioactive labels, (e.g. 32 P, 35 S, 3 H, etc.), and the like. The 
detectable label can involve a two stage systems (e.g. , biotin-avidin, hapten-anti-hapten 
antibody, etc.) 

Reagents specific for the polynucleotides and polypeptides of the invention, such as 
antibodies and nucleotide probes, can be supplied in a kit for detecting the presence of an 
expression product in a biological sample. The kit can also contain buffers or labeling 
components, as well as instructions for using the reagents to detect and quantify expression 
products in the biological sample. Exemplary embodiments of the diagnostic methods of the 
invention are described below in more detail. 

Polypeptide detection in diagnosis. In one embodiment, the test sample is assayed for 
the level of a differentially expressed polypeptide. Diagnosis can be accomplished using any of 
a number of methods to determine the absence or presence or altered amounts of the 
differentially expressed polypeptide in the test sample. For example, detection can utilize 
staining of cells or histological sections with labeled antibodies, performed in accordance with 
conventional methods. Cells can be permeabilized to stain cytoplasmic molecules. In general, 
antibodies that specifically bind a differentially expressed polypeptide of the invention are 
added to a sample, and incubated for a period of time sufficient to allow binding to the epitope, 
usually at least about 10 minutes. The antibody can be detectably labeled for direct detection 
(e.g., using radioisotopes, enzymes, fluorescers, chemiluminescers, and the like), or can be used 
in conjunction with a second stage antibody or reagent to detect binding (e.g., biotin with 
horseradish peroxidase-conjugated avidin, a secondary antibody conjugated to a fluorescent 
compound, e.g. fluorescein, rhodamine, Texas red, etc.). The absence or presence of antibody 
binding can be determined by various methods, including flow cytometry of dissociated cells, 
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microscopy, radiography, scintillation counting, etc. Any suitable alternative methods can of 
qualitative or quantitative detection of levels or amounts of differentially expressed polypeptide 
can be used, for example ELISA, western blot, immunoprecipitation, radioimmunoassay, etc. 

In general, the detected level of differentially expressed polypeptide in the test sample is 
compared to a level of the differentially expressed gene product in a reference or control sample, 
e.g., in a normal cell (negative control) or in a cell having a known disease state (positive 
control). For example, a higher level of expression of a polypeptide encoded by SEQ ID NO:52 
relative to a level associated with a normal sample can indicate the presence of cancer in the 
patient from whom the sample is derived. In another example, detection of a lower level of the 
polypeptide encoded by SEQ ID NO:39 relative to a normal level is indicative of the presence of 
cancer in the patient. 

mRNA detection. The diagnostic methods of the invention can also or alternatively 
involve detection of mRNA encoded by a gene corresponding to a differentially expressed 
polynucleotides of the invention. Any suitable qualitative or quantitative methods known in the 
art for detecting specific mRNAs can be used. mRNA can be detected by, for example, in situ 
hybridization in tissue sections, by reverse transcriptase-PCR, or in Northern blots containing 
poly A+ mRNA. One of skill in the art can readily use these methods to determine differences 
in the size or amount of mRNA transcripts between two samples. For example, the level of 
mRNA of the invention in a tissue sample suspected of being cancerous or dysplastic is 
compared with the expression of the mRNA in a reference sample, e.g. , a positive or negative 
control sample {e.g., normal tissue, cancerous tissue, etc.). In a specific non-limiting example, a 
higher level of mRNA corresponding to SEQ ID NO: 52 relative to a level associated with a 
normal sample can indicate the presence of cancer in the patient from whom the sample is 
derived. In another example, detection of a lower level of mRNA corresponding to SEQ ID 
NO:39 relative to a normal level is indicative of the presence of cancer in the patient. 

Any suitable method for detecting and comparing mRNA expression levels in a sample 
can be used in connection with the diagnostic methods of the invention (see, e.g., 
U.S. 5,804,382). For example, mRNA expression levels in a sample can be determined by 
generation of a library of expressed sequence tags (ESTs) from the sample, where the EST 
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library is representative of sequences present in the sample (Adams, et al., (1991) Science 
252:1651). Enumeration of the relative representation of ESTs within the library can be used to 
approximate the relative representation of the gene transcript within the starting sample. The 
results of EST analysis of a test sample can then be compared to EST analysis of a reference 
sample to determine the relative expression levels of a selected polynucleotide, particularly a 
polynucleotide corresponding to one or more of the differentially expressed genes described 
herein. 

Alternatively, gene expression in a test sample can be performed using serial analysis of 
gene expression (SAGE) methodology (Velculescu et al., Science (1995) 270:484). In short, 
SAGE involves the isolation of short unique sequence tags from a specific location within each 
transcript (e.g. , a sequence of any one of SEQ ID NOS: 1-6). The sequence tags are 
concatenated, cloned, and sequenced. The frequency of particular transcripts within the starting 
sample is reflected by the number of times the associated sequence tag is encountered with the 
sequence population. 

Gene expression in a test sample can also be analyzed using differential display (DD) 
methodology. In DD, fragments defined by specific sequence delimiters (e.g., restriction 
enzyme sites) are used as unique identifiers of genes, coupled with information about fragment 
length or fragment location within the expressed gene. The relative representation of an 
expressed gene with a sample can then be estimated based on the relative representation of the 
fragment associated with that gene within the pool of all possible fragments. Methods and 
compositions for carrying out DD are well known in the art, see, e.g., U.S. 5,776,683; and U.S. 
5,807,680. 

Alternatively, gene expression in a sample using hybridization analysis, which is based 
on the specificity of nucleotide interactions. Oligonucleotides or cDNA can be used to 
selectively identify or capture DNA or RNA of specific sequence composition, and the amount 
of RNA or cDNA hybridized to a known capture sequence determined qualitatively or 
quantitatively, to provide information about the relative representation of a particular message 
within the pool of cellular messages in a sample. Hybridization analysis can be designed to 
allow for concurrent screening of the relative expression of hundreds to thousands of genes by 
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using, for example, array-based technologies having high density formats, including filters, 
microscope slides, or microchips, or solution-based technologies that use spectroscopic analysis 
(e.g. , mass spectrometry). One exemplary use of arrays in the diagnostic methods of the 
invention is described below in more detail 
5 Use of a single gene in diagnostic applications. The diagnostic methods of the invention 

can focus on the expression of a single differentially expressed gene. For example, the 
diagnostic method can involve detecting a differentially expressed gene, or a polymorphism of 
such a gene (e.g., a polymorphism in an coding region or control region), that is associated with 
disease. Disease-associated polymorphisms can include deletion or truncation of the gene, 
1 0 mutations that alter expression level and/or affect activity of the encoded protein, etc, 

O 

p Changes in the promoter or enhancer sequence that affect expression levels of an 

Jjf , differentially gene can be compared to expression levels of the normal allele by various methods 

ill known in the art. Methods for determining promoter or enhancer strength include quantitation 

III 

yj of the expressed natural protein; insertion of the variant control element into a vector with a 

p % . 15 reporter gene such as f3-galactosidase, luciferase, chloramphenicol acetyltransferase, etc. that 
Rl provides for convenient quantitation; and the like. 

III A number of methods are available for analyzing nucleic acids for the presence of a 

ps specific sequence, e.g. a disease associated polymorphism. Where large amounts of DNA are 
available, genomic DNA is used directly. Alternatively, the region of interest is cloned into a 
20 suitable vector and grown in sufficient quantity for analysis. Cells that express a differentially 
expressed gene can be used as a source of mRNA, which can be assayed directly or reverse 
transcribed into cDNA for analysis. The nucleic acid can be amplified by conventional 
techniques, such as the polymerase chain reaction (PCR), to provide sufficient amounts for 
analysis, and a detectable label can be included in the amplification reaction (e.g., using a 
25 detectably labeled primer or detectably labeled oligonucleotides) to facilitate detection. The use 
of the polymerase chain reaction is described in Saiki, et ah 9 Science (1 985) 259:487, and a 
review of techniques can be found in Sambrook, et al, Molecular Cloning: A Laboratory 
Manual, (1989) pp. 14.2. Alternatively, various methods are known in the art that utilize 
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oligonucleotide ligation as a means of detecting polymorphisms, for examples see Riley et ah, 
Nucl. Acids Res. (1990) 7#:2887; and Delahunty et a!., Am. J. Hum. Genet. (1996) 55:1239. 

The sample nucleic acid, e.g. amplified or cloned fragment, is analyzed by one of a 
number of methods known in the art. The nucleic acid can be sequenced by dideoxy or other 
methods, and the sequence of bases compared to a selected sequence, e.g., to a wild-type 
sequence. Hybridization with the polymorphic or variant sequence can also be used to 
determine its presence in a sample (e.g., by Southern blot, dot blot, etc.). The hybridization 
pattern of a polymorphic or variant sequence and a control sequence to an array of 
oligonucleotide probes immobilized on a solid support, as described in US 5,445,934, or in 
WO 95/35505, can also be used as a means of identifying polymorphic or variant sequences 
associated with disease. Single strand conformational polymorphism (SSCP) analysis, 
denaturing gradient gel electrophoresis (DGGE), and heteroduplex analysis in gel matrices are 
used to detect conformational changes created by DNA sequence variation as alterations in 
electrophoretic mobility. Alternatively, where a polymorphism creates or destroys a recognition 
site for a restriction endonuclease, the sample is digested with that endonuclease, and the 
products size fractionated to determine whether the fragment was digested. Fractionation is 
performed by gel or capillary electrophoresis, particularly acrylamide or agarose gels. 

Screening for mutations in an differentially expressed gene can be based on the 
functional or antigenic characteristics of the protein. Protein truncation assays are useful in 
detecting deletions that can affect the biological activity of the protein. Various immunoassays 
designed to detect polymorphisms in proteins can be used in screening. Where many diverse 
genetic mutations lead to a particular disease phenotype, functional protein assays have proven 
to be effective screening tools. The activity of the encoded protein can be determined by 
comparison with the wild-type protein. 

Pattern matching in diagnosis using arrays. In another embodiment, the diagnostic 
and/or prognostic methods of the invention involve detection of expression of a selected set of 
genes in a test sample to produce a test expression pattern (TEP). The TEP is compared to a 
reference expression pattern (REP), which is generated by detection of expression of the 
selected set of genes in a reference sample {e.g., a positive or negative control sample). The 
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selected set of genes includes at least one of the genes of the invention, which genes correspond 
to the polynucleotide sequences of SEQ ID NOS: 1-844. Of particular interest is a selected set of 
genes that includes gene differentially expressed in the disease for which the test sample is to be 
screened. 

"Reference sequences" or "reference polynucleotides" as used herein in the context of 
differential gene expression analysis and diagnosis/prognosis refers to a selected set of 
polynucleotides, which selected set includes at least one or more of the differentially expressed 
polynucleotides described herein. A plurality of reference sequences, preferably comprising 
positive and negative control sequences, can be included as reference sequences. Additional 
suitable reference sequences are found in Genbank, Unigene, and other nucleotide sequence 
databases (including, e.g., expressed sequence tag (EST), partial, and full-length sequences). 

"Reference array" means an array having reference sequences for use in hybridization 
with a sample, where the reference sequences include all, at least one of, or any subset of the 
differentially expressed polynucleotides described herein. Usually such an array will include at 
least 3 different reference sequences, and can include any one or all of the provided 
differentially expressed sequences. Arrays of interest can further comprise sequences, including 
polymorphisms, of other genetic sequences, particularly other sequences of interest for 
screening for a disease or disorder (e.g., cancer, dysplasia, or other related or unrelated diseases, 
disorders, or conditions). The oligonucleotide sequence on the array will usually be at least 
about 12 nt in length, and can be of about the length of the provided sequences, or can extend 
into the flanking regions to generate fragments of 100 nt to 200 nt in length or more. 

A "reference expression pattern" or "REP" as used herein refers to the relative levels of 
expression of a selected set of genes, particularly of differentially expressed genes, that is 
associated with a selected cell type, e.g. , a normal cell, a cancerous cell, a cell exposed to an 
environmental stimulus, and the like. A "test expression pattern" or "TEP" refers to relative 
levels of expression of a selected set of genes, particularly of differentially expressed genes, in a 
test sample (e.g., a cell of unknown or suspected disease state, from which mRNA is isolated). 

"Diagnosis" as used herein generally includes determination of a subject's susceptibility 
to a disease or disorder, determination as to whether a subject is presently affected by a disease 
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or disorder, as well as to the prognosis of a subject affected by a disease or disorder (e.g., 
identification of pre-metastatic or metastatic cancerous states, stages of cancer, or 
responsiveness of cancer to therapy). The present invention particularly encompasses diagnosis 
of subjects in the context of breast cancer (e.g., carcinoma in situ (e.g., ductal carcinoma in situ), 
estrogen receptor (ER)-positive breast cancer, ER-negative breast cancer, or other forms and/or 
stages of breast cancer), lung cancer (e.g., small cell carcinoma, non-small cell carcinoma, 
mesothelioma, and other forms and/or stages of lung cancer), and colon cancer (e.g. , 
adenomatous polyp, colorectal carcinoma, and other forms and/or stages of colon cancer). 

"Sample" or "biological sample" as used throughout here are generally meant to refer to 
samples of biological fluids or tissues, particularly samples obtained from tissues, especially 
from cells of the type associated with the disease for which the diagnostic application is 
designed (e.g., ductal adenocarcinoma), and the like. "Samples" is also meant to encompass 
derivatives and fractions of such samples (e.g., cell lysates). Where the sample is solid tissue, 
the cells of the tissue can be dissociated or tissue sections can be analyzed. 

REPs can be generated in a variety of ways according to methods well known in the art. 
For example, REPs can be generated by hybridizing a control sample to an array having a 
selected set of polynucleotides (particularly a selected set of differentially expressed 
polynucleotides), acquiring the hybridization data from the array, and storing the data in a 
format that allows for ready comparison of the REP with a TEP. Alternatively, all expressed 
sequences in a control sample can be isolated and sequenced, e.g., by isolating mRNA from a 
control sample, converting the mRNA into cDNA, and sequencing the cDNA. The resulting 
sequence information roughly or precisely reflects the identity and relative number of expressed 
sequences in the sample. The sequence information can then be stored in a format (e.g., a 
computer-readable format) that allows for ready comparison of the REP with a TEP. The REP 
can be normalized prior to or after data storage, and/or can be processed to selectively remove 
sequences of expressed genes that are of less interest or that might complicate analysis (e.g., 
some or all of the sequences associated with housekeeping genes can be eliminated from REP 
data). 
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TEPs can be generated in a manner similar to REPs, e.g., by hybridizing a test sample to 
an array having a selected set of polynucleotides, particularly a selected set of differentially 
expressed polynucleotides, acquiring the hybridization data from the array, and storing the data 
in a format that allows for ready comparison of the TEP with a REP. The REP and TEP to be 
used in a comparison can be generated simultaneously, or the TEP can be compared to 
previously generated and stored REPs. 

In one embodiment of the invention, comparison of a TEP with a REP involves 
hybridizing a test sample with a reference array, where the reference array has one or more 
reference sequences for use in hybridization with a sample. The reference sequences include all, 
at least one of, or any subset of the differentially expressed polynucleotides described herein. 
Hybridization data for the test sample is acquired, the data normalized, and the produced TEP 
compared with a REP generated using an array having the same or similar selected set of 
differentially expressed polynucleotides. Probes that correspond to sequences differentially 
expressed between the two samples will show decreased or increased hybridization efficiency 
for one of the samples relative to the other. 

Reference arrays can be produced according to any suitable methods known in the art. 
For example, methods of producing large arrays of oligonucleotides are described in 
U.S. 5,134,854, and U.S. 5,445,934 using light-directed synthesis techniques. Using a computer 
controlled system, a heterogeneous array of monomers is converted, through simultaneous 
coupling at a number of reaction sites, into a heterogeneous array of polymers. Alternatively, 
microarrays are generated by deposition of pre-synthesized oligonucleotides onto a solid 
substrate, for example as described in PCT published application no. WO 95/35505. 

Methods for collection of data from hybridization of samples with a reference arrays are 

also well known in the art. For example, the polynucleotides of the reference and test samples 
can be generated using a detectable fluorescent label, and hybridization of the polynucleotides in 
the samples detected by scanning the microarrays for the presence of the detectable label. 
Methods and devices for detecting fluorescently marked targets on devices are known in the art. 
Generally, such detection devices include a microscope and light source for directing light at a 
substrate. A photon counter detects fluorescence from the substrate, while an x-y translation 
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stage varies the location of the substrate. A confocal detection device that can be used in the 
subject methods is described in U.S. Patent no. 5,63 1 ,734. A scanning laser microscope is 
described in Shalon et al., Genome Res. (1996) 5:639. A scan, using the appropriate excitation 
line, is performed for each fluorophore used. The digital images generated from the scan are 
then combined for subsequent analysis. For any particular array element, the ratio of the 
fluorescent signal from one sample {e.g., a test sample) is compared to the fluorescent signal 
from another sample (e.g., a reference sample), and the relative signal intensity determined. 

Methods for analyzing the data collected from hybridization to arrays are well known in 
the art. For example, where detection of hybridization involves a fluorescent label, data analysis 
can include the steps of determining fluorescent intensity as a function of substrate position 
from the data collected, removing outliers, i.e. data deviating from a predetermined statistical 
distribution, and calculating the relative binding affinity of the targets from the remaining data. 
The resulting data can be displayed as an image with the intensity in each region varying 
according to the binding affinity between targets and probes. 

In general, the test sample is classified as having a gene expression profile corresponding 
to that associated with a disease or non-disease state by comparing the TEP generated from the 
test sample to one or more REPs generated from reference samples {e.g., from samples 
associated with cancer or specific stages of cancer, dysplasia, samples affected by a disease 
other than cancer, normal samples, etc.). The criteria for a match or a substantial match between 
a TEP and a REP include expression of the same or substantially the same set of reference 
genes, as well as expression of these reference genes at substantially the same levels {e.g., no 
significant difference between the samples for a signal associated with a selected reference 
sequence after normalization of the samples, or at least no greater than about 25% to about 40% 
difference in signal strength for a given reference sequence. In general, a pattern match between 
a TEP and a REP includes a match in expression, preferably a match in qualitative or 
quantitative expression level, of at least one of, all or any subset of the differentially expressed 
genes of the invention. 

Pattern matching can be performed manually, or can be performed using a computer 
program. Methods for preparation of substrate matrices {e.g., arrays), design of oligonucleotides 
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for use with such matrices, labeling of probes, hybridization conditions, scanning of hybridized 
matrices, and analysis of patterns generated, including comparison analysis, are described in, for 

example, U.S. 5,800,992. 

F. Use of the Polynucleotides of the Invention in Cancer 

Oncogenesis involves the unbridled growth, dedifferentiation and abnormal migration of 
cells. Cancerous cells can have the ability to compress, invade, and destroy normal tissue. 
Cancerous cells may also metastasize to other parts of the body via the bloodstream or the 
lymph system and colonize in these other areas. Different cancers are classified by the cell from 
which the cancerous cell is derived and from its cellular morphology and/or state of 
differentiation. 

Somatic genetic abnormalities cause cancer initiation and progression. Cancer generally 
is clonally formed, i.e. gain of function of oncogenes and loss of function of tumor suppressor 
genes within a single cell transform the cell to be cancerous, and that single cell grows and 
divides to form a cancerous lesion. The genes known to be involved in cancer initiation and 
progression are involved in numerous cellular functions, including developmental 
differentiation, cell cycle regulation, cell signaling, immunological response, DNA replication, 
and DNA repair. 

The identification and characterization of genetic or biochemical markers in blood or 
tissues that will detect the earliest changes along the carcinogenesis pathway and monitor the 
efficacy of various therapies and preventive interventions is a major goal of cancer research. 
Scientists have identified genetic changes in stool specimens that indicate the stages of colon 
cancer, and other biomarkers such as gene mutations, hormone receptors, proteins that inhibit 
metastasis, and enzymes that metabolize drugs are all being used to determine the severity and 
predict the course of breast, prostate, lung, and other cancers. 

Recent advances in the pathogenesis of certain cancers has been helpful in determining 
patient treatment. The level of expression of certain polynucleotides can be indicative of a 
poorer prognosis, and therefore warrant more aggressive chemo- or radio-therapy for a patient. 
The correlation of novel surrogate tumor specific features with response to treatment and 
outcome in patients has defined certain prognostic indicators that allow the design of tailored 
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therapy based on the molecular profile of the tumor. These therapies include antibody 
targeting and gene therapy. Moreover, a promising level of one or more marker 
polynucleotides can provide impetus for not aggressively treating a particular patient, thus 
sparing the patient the deleterious side effects of aggressive therapy. Determining expression of 

5 certain polynucleotides and comparison of a patients profile with known expression in normal 
tissue and variants of the disease allows a determination of the best possible treatment for a 
patient, both in terms of specificity of treatment and in terms of comfort level of the patient. 

Surrogate tumor markers, such as polynucleotide expression, can also be used to better 
classify, and thus diagnose and treat, different forms and disease states of cancer. Two 

10 classifications widely used in oncology that can benefit from identification of the expression 
levels of the polynucleotides of the invention are staging of the cancerous disorder, and 
grading the nature of the cancerous tissue. 

Staging. Staging is a process used by physicians to describe how advanced the 
cancerous state is in a patient. Staging assists the physician in determining a prognosis, 

1 5 planning treatment and evaluating the results of such treatment. Different staging systems are 
used for different types of cancer, but each generally involves the following determinations: the 
type of tumor, indicated by T; whether the cancer has metastasized to nearby lymph nodes, 
indicated by N; and whether the cancer has metastasized to more distant parts of the body, 
indicated by M. This system of staging is called the TNM system. Generally, if a cancer is only 

20 detectable in the area of the primary lesion without having spread to any lymph nodes it is called 
Stage I. If it has spread only to the closest lymph nodes, it is called Stage II. In Stage III, the 
cancer has generally spread to the lymph nodes in near proximity to the site of the primary 
lesion. Cancers that have spread to a distant part of the body, such as the liver, bone, brain or 
another site, are called Stage IV, the most advanced stage. 

25 Currently, the determination of staging is done using pathological techniques and is 

based more on the presence or absence of malignant tissue rather than the characteristics of the 
tumor type. Presence or absence of malignant tissue is based primarily on the gross morphology 
of the cells in the areas biopsied. The polynucleotides of the invention can facilitate fine-tuning 
of the staging process by identifying markers for the aggresivity of a cancer, e.g. the metastatic 
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potential, as well as the presence in different areas of the body. Thus, a Stage II cancer with a 
polynucleotide signifying a high metastatic potential cancer can be used to change a borderline 
Stage II tumor to a Stage III tumor, justifying more aggressive therapy. Conversely, the 
presence of a polynucleotide signifying a lower metastatic potential allows more conservative 
staging of a tumor. 

Grading of cancers. Grade is a term used to describe how closely a tumor resembles 
normal tissue of its same type. Based on the microscopic appearance of a tumor, pathologists 
will identify the grade of a tumor based on parameters such as cell morphology, cellular 
organization, and other markers of differentiation. As a general rule, the grade of a tumor 
corresponds to its rate of growth or aggressiveness. That is, undifferentiated or high-grade 
tumors grow more quickly than well differentiated or low-grade tumors. Information about 
tumor grade is useful in planning treatment and predicting prognosis. 

The American Joint Commission on Cancer has recommended the following guidelines 
for grading tumors: 1) GX Grade cannot be assessed; 2) Gl Well differentiated; G2 Moderately 
well differentiated; 3) G3 Poorly differentiated; 4) G4 Undifferentiated. Although grading is 
used by pathologists to describe most cancers, it plays a more important role in treatment 
planning for certain types than for others. An example is the Gleason system that is specific for 
prostate cancer, which uses grade numbers to describe the degree of differentiation. Lower 
Gleason scores indicate well-differentiated cells. Intermediate scores denote tumors with 
moderately differentiated cells. Higher scores describe poorly differentiated cells. Grade is also 
important in some types of brain tumors and soft tissue sarcomas. 

The polynucleotides of the invention can be especially valuable in determining the grade 
of the tumor, as they not only can aid in determining the differentiation status of the cells of a 
tumor, they can also identify factors other than differentiation that are valuable in determining 
the aggressivity of a tumor, such as metastatic potential. 

Familial Cancer Genes. A number of cancer syndromes are linked to Mendelian 
inheritance of a predisposition to develop particular cancers. The following table contains a list 
of cancer types that can be inherited, and for which the gene or genes responsible have been 
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identified. Most of the cancer types listed can occur as part of several different genetic 
conditions, each caused by alterations in a different gene. 





Genetic Condition 


Gene 


Brain 


Li-Fraumeni syndrome 
Neurofibromatosis l 
Neurofibromatosis 2 
von Hippel-Lindau syndrome 
Tuberous sclerosis 2 

1 UUvl vUO JV1VI \J Jl J A* 


TP53 

NF1 

NF2 

VHL 

TSC2 


Breast 


Hereditary breast/ovarian cancer 1 

Hereditarv breast/ovarian cancer 2 

ilvl vUllul j Ul wflJl/vY ttiiuii vuiivvi *-< 

Li-Fraumeni syndrome 

Ataxia telangiectasia 


BRCAl 
BRCA2 
TP53 
ATM 


Colon 


Familial adenomatous polyposis (FAP) 
Hereditary non-polyposis colon cancer (HNPCC) 1 
Hereditary non-polyposis colon cancer (HNPCC) 2 
Hereditarv nnn-nolvnosis colon cancer (HNPCC) 3 
Hereditarv non-nolvnosis colon cancer (HNPCC) 4 


APC 

HMSH2 

hMLHl 

hPMSl 

hPMS2 


Endocrine 

(parathyroid, pituitary, GI endocrine) 


Multiple endocrine neoplasia 1 (MEN1) 


MEN1 


Endocrine 

(pheochromacytoma, medullary thyroid) 


A/fiiltmlp pnHnrrine neonlasia 2 f MEN2^ 


RET 


Endometrial 


Hereditary non-polyposis colon cancer (HNPCC) 1 
Upt-zwiJtjirv nnn-nnlvnosis colon cancer fHNPCC^ 2 
Hereditary non-polyposis colon cancer (HNPCC) 3 
Hereditarv non-oolvoosis colon cancer (HNPCC) 4 


hMSH2 
hMLHl 
hPMSl 
hPMS2 


Eye 


Hereditary retinoblastoma 


RBI 


Hematologic 

(lymphomas and leuKemiaj 


Li-Fraumeni syndrome 
Ataxia telangiectasia 


TP53 
ATM 


Kidney 


Hereditary Wilms' tumor 

vnn Hinnel-T /indau svndrome 

Tuberous sclerosis 2 


WT1 
VHL 
TSC2 


Ovary 


Hereditary breast/ovarian cancer 1 
Hereditary breast/ovarian cancer 2 


BRCAl 
BRCA2 


oarcoma 


Hereditarv retinoblastoma 

A. X w A VVi J. kbit T A X* Lilt v v x***-w 

Li-Fraumeni syndrome 
Neurofibromatosis 1 


RBI 

TP53 

NF1 


Skin 


Hereditary melanoma 1 
Hereditary melanoma 2 
Basal cell naevus (Gorlin) syndrome 


CDKN2 

CDK4 

PTCH 


Stomach 


Hereditary non-polyposis colon cancer (HNPCC) 1 
Hereditary non-polyposis colon cancer (HNPCC) 2 
Hereditary non-polyposis colon cancer (HNPCC) 3 
Hereditary non-polyposis colon cancer (HNPCC) 4 


hMSH2 
hMLHl 
hPMSl 
hPMS2 
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The polynucleotides of the invention can be especially useful to monitor patients having any of 
the above syndromes to detect potentially malignant events at a molecular level before they are 
detectable at a gross morphological level. As can be seen from the table, a number of genes are 
involved in multiple forms of cancer. Thus, a polynucleotide of the invention identified as 
important for metastatic colon cancer can also have clinical implications for a patient diagnosed 
with stomach cancer or endometrial cancer. 

Lung Cancer. Lung cancer is one of the most common cancers in the United States, 
accounting for about 15 percent of all cancer cases, or 170,000 new cases each year. At this 
time, over half of the lung cancer cases in the United States are in men, but the number found in 
women is increasing and will soon equal that in men. Today more women die of lung cancer 
than of breast cancer. Lung cancer is especially difficult to diagnose and treat because of the 
large size of the lungs, which allows cancer to develop for years undetected. In fact, lung cancer 
can spread outside the lungs without causing any symptoms. Adding to the confusion, the most 
common symptom of lung cancer, a persistent cough, can often be mistaken for a cold or 
bronchitis. 

Although there are more than a dozen different kinds of lung cancer, the two main types 
of lung cancer are small cell and nonsmall cell, which encompass about 90% of all lung cancer 
cases. Small cell carcinoma (also called oat cell carcinoma), which usually starts in one of the 
larger bronchial tubes, grows fairly rapidly, and is likely to be large by the time of diagnosis. 
Nonsmall cell lung cancer (NSCLC) is made up of three general subtypes of lung cancer. 
Epidermoid carcinoma (also called squamous cell carcinoma) usually starts in one of the larger 
bronchial tubes and grows relatively slowly. The size of these tumors can range from very small 
to quite large. Adenocarcinoma starts growing near the outside surface of the lung and can vary 
in both size and growth rate. Some slowly growing adenocarcinomas are described as alveolar 
cell cancer. Large cell carcinoma starts near the surface of the lung, grows rapidly, and the 
growth is usually fairly large when diagnosed. Other less common forms of lung cancer are 
carcinoid, cylindroma, mucoepidermoid, and malignant mesothelioma. 

Currently, CT scans, MRIs, X-rays, sputum cytology, and biopsies are used to diagnose 
nonsmall cell lung cancer. The form and cellular origin of the lung cancer is diagnosed 
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primarily through biopsy from either a surgical biopsy or a needle aspiration of lung tissue, and 
usually the biopsy is prompted from an abnormality identified on an X-ray. In some cases, 
sputum cytology can reveal lung cancers in patients with normal X-rays or can determine the 
type of lung cancer, but because it cannot pinpoint the tumor's location, a positive sputum 
cytology test is usually followed by further tests. Since these tests are based in large part on 
gross morphology of the tissue, the diagnosis of a particular kind of tumor is largely subjective, 
and the diagnosis can vary significantly between clinicians. 

The polynucleotides of the invention can be used to distinguish types of lung cancer as 
well as identifying traits specific to a certain patient's cancer. For example, if the patient's 
biopsy expresses a polynucleotide that is associated with a low metastatic potential, it may 
justify leaving a larger portion of the patient's lung in surgery to remove the lesion. 
Alternatively, a smaller lesion with expression of a polynucleotide that is associated with high 
metastatic potential may justify a more radical removal of lung tissue and/or the surrounding 
lymph nodes, even if no metastasis can be identified through pathological examination. 

Similarly, the expression of polynucleotides of the invention can be used in the 
diagnosis, prognosis and management of colorectal cancer. The differential expression of a 
polynucleotide in hyperplasia can be used as a diagnostic marker for metastatic lung cancer. 
The polynucleotides of the invention that would be especially useful for this purpose are those 
that exhibit differential expression between high metastatic versus low metastatic lung cancer , 
Le. SEQ ID NOS: 9, 34, 42, 62, 74, 106, 119, 135, 154, 160, 260, 308, 323, 349, 361, 369, 371, 
381 , 395, and 400. Detection of malignant lung cancer with a higher metastatic potential can be 
determined using expression levels of any of these sequences alone or in combination with the 
levels of expression of other known genes. 

Breast Cancer. The National Cancer Institute (NCI) estimates that about 1 in 8 women 
in the United States will develop breast cancer during her lifetime. Clinical breast examination 
and mammography are recommended as combined modalities for breast cancer screening, and 
the nature of the cancer will often depend upon the location of the tumor and the cell type from 
which the tumor is derived. The majority of breast cancers are adenocarcinomas subtypes, 
which can be summarized as follows: 
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Ductal carcinoma in situ (DCIS): Ductal carcinoma in situ is the most common type of 
noninvasive breast cancer. In DCIS, the malignant cells have not metastasized through the walls 
of the ducts into the fatty tissue of the breast. Comedocarcinoma is a type of DCIS that is more 
likely than other types of DCIS to come back in the same area after lumpectomy. It is more 
closely linked to eventual development of invasive ductal carcinoma than other forms of DCIS. 

Infiltrating (or invasive) ductal carcinoma (IDC): this type of cancer has metastasized 
through the wall of the duct and invaded the fatty tissue of the breast. At this point, it has the 
potential to use the lymphatic system and bloodstream for metastasis to more distant parts of the 
body. Infiltrating ductal carcinoma accounts for about 80% of breast cancers. 

Lobular carcinoma in situ (LCIS): While not a true cancer, LCIS (also called lobular 
neoplasia) is sometimes classified as a type of noninvasive breast cancer. It does not penetrate 
through the wall of the lobules. Although it does not itself usually become an invasive cancer, 
women with this condition have a higher risk of developing an invasive breast cancer in the 
same breast, or in the opposite breast. 

Infiltrating (or invasive) lobular carcinoma (ILC): ILC is similar to IDC, in that it has the 
potential metastasize elsewhere in the body. About 10% to 15% of invasive breast cancers are 
invasive lobular carcinomas. ILC can be more difficult to detect by mammogram than IDC. 

Inflammatory breast cancer: This rare type of invasive breast cancer accounts for about 
1% of all breast cancers and is extremely aggressive. Multiple skin symptoms associated with 
this cancer are caused by cancer cells blocking lymph vessels or channels in the skin over the 
breast. 

Medullary carcinoma: This special type of infiltrating breast cancer has a relatively well 
defined, distinct boundary between tumor tissue and normal tissue. It accounts for about 5% of 
breast cancers. The prognosis for this kind of breast cancer is better than for other types of 

invasive breast cancer. 

Mucinous carcinoma: This rare type of invasive breast cancer originates from mucus- 
producing cells. The prognosis for mucinous carcinoma is better than for the more common 
types of invasive breast cancer. 
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Paget's disease of the nipple: This type of breast cancer starts in the ducts and spreads to 
the skin of the nipple and the areola. It is a rare type of breast cancer, occurring in only 1% of 
all cases. Paget's disease can be associated with in situ carcinoma, or with infiltrating breast 
carcinoma. If no lump can be felt in the breast tissue, and the biopsy shows DCIS but no 
invasive cancer, the prognosis is excellent. 

Phyllodes tumor: This very rare type of breast tumor forms from the stroma of the breast, 
in contrast to carcinomas which develop in the ducts or lobules. Phyllodes (also spelled 
phylloides) tumors are usually benign, but are malignant on rare occasions. Nevertheless, 
malignant phyllodes tumors are very rare and less than 10 women per year in the US die of this 
disease. Benign phyllodes tumors are successfully treated by removing the mass and a narrow 
margin of normal breast tissue. 

Tubular carcinoma: Accounting for about 2% of all breast cancers, tubular carcinomas 
are a special type of infiltrating breast carcinoma. They have a better prognosis than usual 
infiltrating ductal or lobularcarcinomas. 

High-quality mammography combined with clinical breast exam remains the only 
screening method clearly tied to reduction in breast cancer mortality. Lower dose x-rays, 
digitized computer rather than film images, and the use of computer programs to assist 
diagnosis, are almost ready for widespread dissemination. Other technologies also are being 
developed, including magnetic resonance imaging and ultrasound. In addition, a very low 
radiation exposure technique, positron emission tomography has the potential for detecting early 
breast cancer. 

It is also possible to differentiate between non-cancerous breast tissue and malignant 

m 

breast tissue by analyzing differential gene expression between tissues. In addition, there may 
be several possible alterations that lead to the various possible types of breast cancer. The 
different types of breast tumors (e.g., invasive vs. non-invasive, ductal vs. axillary lymph node) 
can be differentiable from one another by the identification of the differences in genes expressed 
by different types of breast tumor tissues (Porter- Jordan et ah, Hematol Oncol Clin North Am 
(1994) 8:13). Breast cancer can thus be generally diagnosed by detection of expression of a 
gene or genes associated with breast tumors. Where enough information is available about the 
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differential gene expression between various types of breast tumor tissues, the specific type of 

breast tumor can also be diagnosed. 

For example, increased estrogen receptor (ER) expression in normal breast epithileum, 
while not itself indicative of malignant tissue, is a known risk marker for development of breast 
5 cancer. Khan SA et al. , Cancer Res (1 994) 54:993 . Malignant breast cancer is often divided 
into two groups, ER-positive and ER-negative, based on the estrogen receptor status of the 
tissue. The ER status represents different survival length and response to hormone therapy, and 
is thought to represent either: 1 ) an indicator of different stages of the disease, or 2) an indicator 
that allows differentiation between two similar but distinct diseases. K. Zhu et al. , Med. 

s : 

0?S*l •>«« 

CI 1 0 Hypoth. ( 1 997) 49: 69 . A number of other genes are known to vary expression between either 
1 1 different stages of cancer or different types of similar breast cancer. 

GJ Similarly, the expression of polynucleotides of the invention can be used in the diagnosis 

HI 

ill and management of breast cancer. The differential expression of a polynucleotide in human 

breast tumor tissue can be used as a diagnostic marker for human breast cancer. The 
B 1 5 polynucleotides of the invention that would be especially useful for this purpose are those that 

sal 

K exhibit differential expression between breast cancer tissue with a high metastatic potential and 
O a low metastatic potential, i.e. SEQ ID NOS: 9, 42, 52, 62, 65, 66, 68, 1 14, 123, 144, 172, 178, 

^ 214, 219, 223, 258, 317, and 379. Detection of breast cancer can be determined using 

expression levels of any of these sequences alone or in combination. Determination of the 
20 aggressive nature and/or the metastatic potential of a breast cancer can also be determined by 
comparing levels of one or more polynucleotides of the invention and comparing levels of 
another sequence known to vary in cancerous tissue, e.g. ER expression. In addition, 
development of breast cancer can be detected by examining the ratio of SEQ ID NO : to the 
levels of steroid hormones (e.g. , testosterone or estrogen) or to other hormones (e.g. , growth 
25 hormone, insulin). Thus expression of specific marker polynucleotides can be used to 

discriminate between normal and cancerous breast tissue, to discriminate between breast cancers 
with different cells of origin, to discriminate between breast cancers with different potential 
metastatic rates, etc. 
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Diagnosis of breast cancer can also involve comparing the expression of a 
polynucleotide of the invention with the expression of other sequences in non-malignant breast 
tissue samples in comparison to one or more forms of the diseased tissue. A comparison of 
expression of one or more polynucleotides of the invention between the samples provides 
5 information on relative levels of these polynucleotides as well as the ratio of these 

polynucleotides to the expression of other sequences in the tissue of interest compared to 
normal. 

This risk of breast cancer is elevated significantly by the presence of an inherited risk for 
breast cancer, such as a mutation in BRCA-1 or BRCA-2. New diagnostic tools are being 
D 1 0 developed to address the needs of higher risk patients to complement mammography and 
\i physical examinations for early detection of breast cancer, particularly among younger women. 

i 5 !! The presence of antigen or expression markers in nipple aspirate fluid (NAF) samples collected 

I J. I 

HI from one or both breasts can be useful for useful for risk assessment or early cancer detection. 

y I 

g . Breast cytology and biomarkers obtained by random fine needle aspiration have been used to 

r'li 

K 15 identify hyperplasia with atypia and overexpression of p53 and EGFR. The polynucleotides of 
r* the invention can be used in multivariate analysis with expression studies with genes such as 

j 5 S 

Q p53 and EGFR as risk predictors and as surrogate endpoint biomarkers for breast cancer. 

* !»' 

* — *\ 

? * As well as being used for diagnosis and risk assessment, the expression of certain genes 

can also correlated to prognosis of a disease state. The expression of particular gene have been 

20 used as prognostic indicators for breast cancer including increased expression of c-erbB-2, pS2, 
ER, progesterone receptor, epidermal growth factor receptor (EGFR), neu, myc, bcl-2, int2 9 
cytosolic tyrosine kinase, cyclin E, prad-1, hst, uPA, PAI-1, PAI-2, cathepsin D, as well as the 
presence of a number of cancer-specific antigens, e.g. CEA, CA M26, CA M29 and CA 15.3. 
Davis, Br. J. Biomed Sci. (1 996) 53: 1 57. Poor prognosis has also been linked to a decrease in 

25 expression of certain genes, such as p53, Rb, nm23. The expression of the polynucleotides of 
the invention can be of prognostic value for determining the metastatic potential of a malignant 
breast cancer, as this molecules are differentially expressed between high and low metastatic 
potential tissues tumors. The levels of these polynucleotides in patients with malignant breast 
cancer can compared to normal tissue, malignant tissue with a known high potential metastatic 
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level, and malignant tissue with a known lower level of metastatic potential to provide a 
prognosis for a particular patient. Such a prognosis is predictive of the extent and nature of the 
cancer. The determined prognosis is useful in determining the prognosis of a patient with breast 
cancer, both for initial treatment of the disease and for longer-term monitoring of the same 
patient. If samples are taken from the same individual over a period of time, differences in 
polynucleotide expression that are specific to that patient can be identified and closely watched. 

Colon Cancer. Colorectal cancer is one of the most common neoplasms in humans and 
perhaps the most frequent form of hereditary neoplasia. Prevention and early detection are key 
factors in controlling and curing colorectal cancer. Indeed, colorectal cancer is the second most 
preventable cancer, after lung cancer. Colorectal cancer begins as polyps, which are small, 
benign growths of cells that form on the inner lining of the colon. Over a period of several 
years, some of these polyps accumulate additional mutations and become cancerous. About 20 
percent of all cases of colon cancer are thought to be related to heredity. Currently, multiple 
familial colorectal cancer disorders have been identified, which are summarized as follows: 

Familial adenomatous polyposis (FAP): This condition results in a person having 
hundreds or even thousands of polyps in the colon and rectum that usually first appear during 
the teenage years. Cancer nearly always develops in one or more of these polyps between the 
ages of 30 and 50. 

Gardner's syndrome: Like FAP, Gardner's syndrome results in polyps and colorectal 
cancers that develop at a young age. It can also cause benign tumors of the skin, soft connective 
tissue and bones. 

Hereditary nonpolyposis colon cancer (HNPCC): People with this condition tend to 
develop colorectal cancer at a young age, without first having many polyps. HNPCC has an 
autosomal dominant pattern of inheritance with variable but high penetrance estimated to be 
about 90%. HNPCC underlies 0.5%-10% of all cases of colorectal cancer. An understanding of 
the mechanisms behind the development of HNPCC is emerging, and genetic presymptomatic 
testing, now being conducted in research settings, soon will be available on a widespread basis 
for individuals identified at risk for this disease. 
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Familial colorectal cancer in Ashkenazi Jews: Recent research has found an inherited 
tendency to developing colorectal cancer among some Jews of Eastern European descent. Like 
people with FAP, Gardner's syndrome, and HNPCC, their increased risk is due to an inherited 
mutation present in about 6% of American Jews. 
5 Several tests are currently used to screen for colorectal cancer, including digital rectal 

examination, fecal occult blood test, sigmoidoscopy, colonoscopy, virtual colonoscopy and 
MRL Each of these tests identifies potential colorectal cancer lesions, or a risk of development 
of these lesions, at a fairly gross morphological level. 

The sequential alteration of a number of genes is associated with malignant 
C| 10 adenocarcinoma, including the genes DCC, p53, ras, and FAP. For a review, see e.g. Fearon 

M ER, etal, Cell (1990) 61(5):759; Hamilton SRetal., Cancer (1993) 72:957; BodmerW, etal., 
S Nat Genet. (1 994) 4(3) :21 7; Fearon ER, Ann N Y Acad Sci. (1 995) 768: 101. Molecular genetic 
W alterations are thus promising as potential diagnostic and prognostic indicators in colorectal 

is. i 

l carcinoma and molecular genetics of colorectal carcinoma since it is possible to differentiate 

m 15 between different types of colorectal neoplasias using molecular markers. Colorectal cancer can 

ft thus be generally diagnosed by detection of expression of a gene or genes associated with 

Iff 

P colorectal tumors. 

f w Similarly, the expression of polynucleotides of the invention can be used in the 

diagnosis, prognosis and management of colorectal cancer. The differential expression of a 
20 polynucleotide in hyperplasia can be used as a diagnostic marker for colon cancer. The 

polynucleotides of the invention that would be especially useful for this purpose are those that 
exhibit differential expression between malignant metastatic colon cancer and normal patient 
tissue , ie. SEQ ID NOS: 52, 1 1 9, 172, 288. Detection of malignant colon cancer can be 
determined using expression levels of any of these sequences alone or in combination with the 

25 levels of expression. 

Determination of the aggressive nature and/or the metastatic potential of a colon cancer 
can also be determined by comparing levels of one or more polynucleotides of the invention and 
comparing total levels of another sequence known to vary in cancerous tissue, e.g. p53 
expression. In addition, development of colon cancer can be detected by examining the ratio of 
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any of the polynucleotides of the invention to the levels of oncogenes {e.g. ras) or tumor 
suppressor genes (e.g. FAP or p53). Thus expression of specific marker polynucleotides can 
be used to discriminate between normal and cancerous breast tissue, to discriminate between 
breast cancers with different cells of origin, to discriminate between breast cancers with 
different potential metastatic rates, etc. 

G. Use of Polynucleotides to Screen for Peptide Analogs and Antagonists 
Polypeptides encoded by the instant polynucleotides and corresponding full length genes 
can be used to screen peptide libraries to identify binding partners, such as receptors, from 
among the encoded polypeptides. 

A library of peptides can be synthesized following the methods disclosed in U.S. Pat. 
No. 5,010,175 0175), and in WO 91/17823. As described below in brief, one prepares a 
mixture of peptides, which is then screened to identify the peptides exhibiting the desired signal 
transduction and receptor binding activity. In the '175 method, a suitable peptide synthesis 
support (e.g., a resin) is coupled to a mixture of appropriately protected, activated amino acids. 
The concentration of each amino acid in the reaction mixture is balanced or adjusted in inverse 
proportion to its coupling reaction rate so that the product is an equimoiar mixture of amino 
acids coupled to the starting resin. The bound amino acids are then deprotected, and reacted 
with another balanced amino acid mixture to form an equimoiar mixture of all possible 
dipeptides. This process is repeated until a mixture of peptides of the desired length (e.g., 
hexamers) is formed. Note that one need not include all amino acids in each step: one can 
include only one or two amino acids in some steps (e.g. , where it is known that a particular 
amino acid is essential in a given position), thus reducing the complexity of the mixture. After 
the synthesis of the peptide library is completed, the mixture of peptides is screened for binding 
to the selected polypeptide. The peptides are then tested for their ability to inhibit or enhance 
activity. Peptides exhibiting the desired activity are then isolated and sequenced. 
The method described in WO 91/17823 is similar. However, instead of reacting the synthesis 
resin with a mixture of activated amino acids, the resin is divided into twenty equal portions (or 
into a number of portions corresponding to the number of different amino acids to be added in 
that step), and each amino acid is coupled individually to its portion of resin. The resin portions 
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are then combined, mixed, and again divided into a number of equal portions for reaction with 
the second amino acid. In this manner, each reaction can be easily driven to completion. 
Additionally, one can maintain separate "subpools" by treating portions in parallel, rather than 
combining all resins at each step. This simplifies the process of determining which peptides are 
responsible for any observed receptor binding or signal transduction activity. 

In such cases, the subpools containing, e.g., 1-2,000 candidates each are exposed to one 
or more polypeptides of the invention. Each subpool that produces a positive result is then 
resynthesized as a group of smaller subpools (sub-subpools) containing, e.g., 20-100 candidates, 
and reassayed. Positive sub-subpools can be resynthesized as individual compounds, and 
assayed finally to determine the peptides that exhibit a high binding constant. These peptides 
can be tested for their ability to inhibit or enhance the native activity. The methods described in 
WO 91/7823 and U.S. Patent No. 5,194,392 (herein incorporated by reference) enable the 
preparation of such pools and subpools by automated techniques in parallel, such that all 
synthesis and resynthesis can be performed in a matter of days. 

Peptide agonists or antagonists are screened using any available method, such as signal 
transduction, antibody binding, receptor binding, mitogenic assays, chemotaxis assays, etc. The 
methods described herein are presently preferred. The assay conditions ideally should resemble 
the conditions under which the native activity is exhibited in vivo, that is, under physiologic pH, 
temperature, and ionic strength. Suitable agonists or antagonists will exhibit strong inhibition or 
enhancement of the native activity at concentrations that do not cause toxic side effects in the 
subject. Agonists or antagonists that compete for binding to the native polypeptide can require 
concentrations equal to or greater than the native concentration, while inhibitors capable of 
binding irreversibly to the polypeptide can be added in concentrations on the order of the native 
concentration. 

The end results of such screening and experimentation will be at least one novel 
polypeptide binding partner, such as a receptor, encoded by a gene or a cDNA corresponding to 
a polynucleotide of the invention, and at least one peptide agonist or antagonist of the novel 
binding partner. Such agonists and antagonists can be used to modulate, enhance, or inhibit 
receptor function in cells to which the receptor is native, or in cells that possess the receptor as a 
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result of genetic engineering. Further, if the novel receptor shares biologically important 
characteristics with a known receptor, information about agonist/antagonist binding can 
facilitate development of improved agonists/antagonists of the known receptor. 
H. Pharmaceutical Compositions and Therapeutic Uses 

Pharmaceutical compositions can comprise polypeptides, antibodies, or polynucleotides 
of the claimed invention. The pharmaceutical compositions will comprise a therapeutically 
effective amount of either polypeptides, antibodies, or polynucleotides of the claimed invention. 

The term "therapeutically effective amount" as used herein refers to an amount of a 
therapeutic agent to treat, ameliorate, or prevent a desired disease or condition, or to exhibit a 
detectable therapeutic or preventative effect. The effect can be detected by, for example, 
chemical markers or antigen levels. Therapeutic effects also include reduction in physical 
symptoms, such as decreased body temperature. The precise effective amount for a subject will 
depend upon the subject's size and health, the nature and extent of the condition, and the 
therapeutics or combination of therapeutics selected for administration. Thus, it is not useful to 
specify an exact effective amount in advance. However, the effective amount for a given 
situation is determined by routine experimentation and is within the judgment of the clinician. 
For purposes of the present invention, an effective dose will generally be from about 0.01 mg/ 
kg to 50 mg/kg or 0.05 mg/kg to about 10 mg/kg of the DNA constructs in the individual to 
which it is administered. 

A pharmaceutical composition can also contain a pharmaceutically acceptable carrier. 
The term "pharmaceutically acceptable carrier" refers to a carrier for administration of a 
therapeutic agent, such as antibodies or a polypeptide, genes, and other therapeutic agents. The 
term refers to any pharmaceutical carrier that does not itself induce the production of antibodies 
harmful to the individual receiving the composition, and which can be administered without 
undue toxicity. Suitable carriers can be large, slowly metabolized macromolecules such as 
proteins, polysaccharides, polylactic acids, polyglycolic acids, polymeric amino acids, amino 
acid copolymers, and inactive virus particles. Such carriers are well known to those of ordinary 
skill in the art. 
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Pharmaceutical^ acceptable salts can be used therein, for example, mineral acid salts 
such as hydrochlorides, hydrobromides, phosphates, sulfates, and the like; and the salts of 
organic acids such as acetates, propionates, malonates, benzoates, and the like. A thorough 
discussion of pharmaceutical^ acceptable excipients is available in Remington's 
Pharmaceutical Sciences (Mack Pub. Co., N.J. 1991). 

Pharmaceutical^ acceptable carriers in therapeutic compositions can include liquids 
such as water, saline, glycerol and ethanol. Auxiliary substances, such as wetting or 
emulsifying agents, pH buffering substances, and the like, can also be present in such vehicles. 
Typically, the therapeutic compositions are prepared as injectables, either as liquid solutions or 
suspensions; solid forms suitable for solution in, or suspension in, liquid vehicles prior to 
injection can also be prepared. Liposomes are included within the definition of a 
pharmaceutically acceptable carrier. 

Delivery Methods. Once formulated, the compositions of the invention can be 
(1) administered directly to the subject (e.g., as polynucleotide or polypeptides); (2) delivered ex 
vivo, to cells derived from the subject (e.g., as in ex vivo gene therapy); or (3) delivered in vitro 
for expression of recombinant proteins (e.g., polynucleotides). Direct delivery of the 
compositions will generally be accomplished by injection, either subcutaneously, 
intraperitoneally, intravenously or intramuscularly, or delivered to the interstitial space of a 
tissue. The compositions can also be administered into a tumor or lesion. Other modes of 
administration include oral and pulmonary administration, suppositories, and transdermal 

applications, needles, and gene guns or hyposprays. Dosage treatment can be a single dose 
schedule or a multiple dose schedule. 

Methods for the ex vivo delivery and reimplantation of transformed cells into a subject 
are known in the art and described in e.g., International Publication No. WO 93/14778. 
Examples of cells useful in ex vivo applications include, for example, stem cells, particularly 
hematopoetic, lymph cells, macrophages, dendritic cells, or tumor cells. Generally, delivery of 
nucleic acids for both ex vivo and in vitro applications can be accomplished by, for example, 
dextran-mediated transfection, calcium phosphate precipitation, polybrene mediated 
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transfection, protoplast fusion, electroporation, encapsulation of the polynucleotide(s) in 
liposomes, and direct microinjection of the DNA into nuclei, all well known in the art. 

Once a gene corresponding to a polynucleotide of the invention has been found to 
correlate with a proliferative disorder, such as neoplasia, dysplasia, and hyperplasia, the disorder 
can be amenable to treatment by administration of a therapeutic agent based on the provided 
polynucleotide or corresponding polypeptide. 

Preparation of antisense polynucleotides is discussed above. Neoplasias that are treated 
with the antisense composition include, but are not limited to, cervical cancers, melanomas, 
colorectal adenocarcinomas, Wilms' tumor, retinoblastoma, sarcomas, myosarcomas, lung 
carcinomas, leukemias, such as chronic myelogenous leukemia, promyelocytic leukemia, 
monocytic leukemia, and myeloid leukemia, and lymphomas, such as histiocytic lymphoma. 
Proliferative disorders that are treated with the therapeutic composition include disorders such 
as anhydric hereditary ectodermal dysplasia, congenital alveolar dysplasia, epithelial dysplasia 
of the cervix, fibrous dysplasia of bone, and mammary dysplasia. Hyperplasias, for example, 
endometrial, adrenal, breast, prostate, or thyroid hyperplasias or pseudoepitheliomatous 
hyperplasia of the skin, are treated with antisense therapeutic compositions based upon a 
polynucleotide of the invention. Even in disorders in which mutations in the corresponding 
gene are not implicated, downregulation or inhibition of expression of a gene corresponding to a 
polynucleotide of the invention can have therapeutic application. For example, decreasing gene 
expression can help to suppress tumors in which enhanced expression of the gene is implicated. 

Both the dose of the antisense composition and the means of administration are 
determined based on the specific qualities of the therapeutic composition, the condition, age, 
and weight of the patient, the progression of the disease, and other relevant factors. 
Administration of the therapeutic antisense agents of the invention includes local or systemic 
administration, including injection, oral administration, particle gun or catheterized 
administration, and topical administration. Preferably, the therapeutic antisense composition 
contains an expression construct comprising a promoter and a polynucleotide segment of at least 
12, 22, 25, 30, or 35 contiguous nucleotides of the antisense strand of a polynucleotide disclosed 
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herein. Within the expression construct, the polynucleotide segment is located downstream 
from the promoter, and transcription of the polynucleotide segment initiates at the promoter. 

Various methods are used to administer the therapeutic composition directly to a specific 
site in the body. For example, a small metastatic lesion is located and the therapeutic 
5 composition injected several times in several different locations within the body of tumor. 
Alternatively, arteries which serve a tumor are identified, and the therapeutic composition 
injected into such an artery, in order to deliver the composition directly into the tumor. A tumor 
that has a necrotic center is aspirated and the composition injected directly into the now empty 
center of the tumor. The antisense composition is directly administered to the surface of the 

£ * ■ 

El 10 tumor, for example, by topical application of the composition. X-ray imaging is used to assist in 

0 

SI certain of the above delivery methods. 

H| Receptor-mediated targeted delivery of therapeutic compositions containing an antisense 

Hi polynucleotide, subgenomic polynucleotides, or antibodies to specific tissues is also used. 

Ill 

p Receptor-mediated DNA delivery techniques are described in, for example, Findeis et al. 9 

m 15 Trends Biotechnol (1993) 77:202; Chiou et a/., Gene Therapeutics: Methods And Applications 

t- *">.**, 

h ■ Of Direct Gene Transfer (J.A. Wolff, ed.) (1994); Wu et al.,J. Biol. Chem. (1988) 255:621; Wu 
CI et ah, J. Biol. Chem. (1994) 269:542; Zenke et ah, Proc. Natl. Acad. Set (USA) (1990) 57:3655; 

f z $ 

« £ «; 
* "* J* 

Wu et aL 9 J. Biol Chem. (1991) 2(5(5:338. Preferably, receptor-mediated targeted delivery of 
therapeutic compositions containing antibodies of the invention is used to deliver the antibodies 

20 to specific tissue. 

Therapeutic compositions containing antisense subgenomic polynucleotides are 
administered in a range of about 1 00 ng to about 200 mg of DNA for local administration in a 
gene therapy protocol. Concentration ranges of about 500 ng to about 50 mg, about 1 jag to 
about 2 mg, about 5 jig to about 500 jxg, and about 20 \ig to about 100 jig of DNA can also be 

25 used during a gene therapy protocol. Factors such as method of action and efficacy of 
transformation and expression are considerations which will affect the dosage required for 
ultimate efficacy of the antisense subgenomic polynucleotides. Where greater expression is 
desired over a larger area of tissue, larger amounts of antisense subgenomic polynucleotides or 
the same amounts readministered in a successive protocol of administrations, or several 
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administrations to different adjacent or close tissue portions of, for example, a tumor site, may 
be required to effect a positive therapeutic outcome. In all cases, routine experimentation in 
clinical trials will determine specific ranges for optimal therapeutic effect. A more complete 
description of gene therapy vectors, especially retroviral vectors, is contained in U.S. Serial No. 
08/869,309, which is expressly incorporated herein, and in section G below. 

For polynucleotide-related genes encoding polypeptides or proteins with anti- 
inflammatory activity, suitable use, doses, and administration are described in U.S. Patent No. 
5,654,173. Therapeutic agents also include antibodies to proteins and polypeptides encoded by 
the polynucleotides of the invention and related genes, as described in U.S. Patent No. 
5,654,173. 

L Gene Therapy 

The therapeutic polynucleotides and polypeptides of the present invention can be utilized 
in gene delivery vehicles. The gene delivery vehicle can be of viral or non-viral origin (see 
generally, Jolly, Cancer Gene Therapy (1994) 7:51; Kimura, Human Gene Therapy (1994) 
5:845; Connelly, Human Gene Therapy (1995) 7:185; and Kaplitt, Nature Genetics (1994) 
tf:148). Gene therapy vehicles for delivery of constructs including a coding sequence of a 
therapeutic of the invention can be administered either locally or systemically. These constructs 
can utilize viral or non-viral vector approaches. Expression of such coding sequences can be 
induced using endogenous mammalian or heterologous promoters. Expression of the coding 
sequence can be either constitutive or regulated. 

The present invention can employ recombinant retroviruses which are constructed to 
carry or express a selected nucleic acid molecule of interest. Retrovirus vectors that can be 
employed include those described in EP 0 415 73 1; WO 90/07936; WO 94/03622; WO 
93/25698; WO 93/25234; U.S. Patent No. 5, 219,740; WO 93/1 1230; WO 93/10218; Vile and 
Hart, Cancer Res. (1993) 55:3860; Vile et al, Cancer Res. (1993) 55:962; Ram et al, Cancer 
Res. (1993) 53:83; Takamiya et al, J. Neuroscl Res. (1992) 55:493; Baba et al., J. Neurosurg. 
(1993) 79:729; U.S. Patent No. 4,777,127; GB Patent No. 2,200,651; and EP 0 345 242. 

Preferred recombinant retroviruses include those described in WO 91/02805. 
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Packaging cell lines suitable for use with the above-described retroviral vector constructs 
can be readily prepared (see, e.g. , WO 95/30763 and WO 92/05266), and used to create 
producer cell lines (also termed vector cell lines) for the production of recombinant vector 
particles. Within particularly preferred embodiments of the invention, packaging cell lines are 
5 made from human (such as HT 1 080 cells) or mink parent cell lines, thereby allowing production 
of recombinant retroviruses that can survive inactivation in human serum. 

The present invention also employs alphavirus-based vectors that can function as gene 
delivery vehicles. Such vectors can be constructed from a wide variety of alphaviruses, 
including, for example, Sindbis virus vectors, Semliki forest virus (ATCC VR-67; ATCC VR- 
CI 10 1247), Ross River virus (ATCC VR-373; ATCC VR-1246) and Venezuelan equine encephalitis 
y virus (ATCC VR-923; ATCC VR-1250; ATCC VR 1249; ATCC VR-532). Representative 
ill examples of such vector systems include those described in U.S. Patent Nos. 5,091 ,309; 
HI 5,217,879; and 5,185,440; WO 92/10578; WO 94/21792; WO 95/27069; WO 95/27044; and 
* WO 95/07994. Gene delivery vehicles of the present invention can also employ parvovirus such 

illS as adeno-associated virus (AAV) vectors. Representative examples include the AAV vectors 
H disclosed by Srivastava in WO 93/09239, Samulski et al., J. Virol. (1989) 55:3822; Mendelson 

S. S S 

5 etal., Virol. (1988) 166:154; and Flotte etal, PNAS (1993) 00:10613. 

Representative examples of adenoviral vectors include those described by Berkner, 
Biotechniques (1988) <5:616; Rosenfeld et al, Science (1991) 252:431; WO 93/19191; Kolls et 
20 a/., PA^S (1994) 97:215; Kass-Eisler era/., ^ 

(1993) 55:2838; Guzman et al.., dr. Res. (1993) 75:1202; Zabner et al, Cell (1993) 75:207; Li 
et al, Hum. Gene Ther. (1993) 4:403; Cailaud et al, Eur. J. Neurosci. (1 993) 5:1287; Vincent et 
al., Nat. Genet. (1993) 5:130; Jaffe et al, Nat. Genet. (1992) 7:372; and Levrero et al, Gene 
(1991)7 07:195. Exemplary adenoviral gene therapy vectors employable in this invention also 
25 include those described in WO 94/1 2649, WO 93/03769; WO 93/1 9 1 91 ; WO 94/2893 8; 
WO 95/1 1 984 and WO 95/00655. Administration of DNA linked to killed adenovirus as 
described in Curiel, Hum. Gene Ther. (1992) 5:147 can be employed. 

Other gene delivery vehicles and methods can be employed, including polycationic 
condensed DNA linked or unlinked to killed adenovirus alone, for example Curiel, Hum. Gene 
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Ther. (1992) 5:147; ligand linked DNA, for example see Wu, J. Biol. Chem. (1989) 2(5^:16985; 
eukaryotic cell delivery vehicles cells, for example see U.S. Pat. No. 5,814,482; WO 95/07994; 
WO 96/17072; WO 95/30763; and WO 97/42338; deposition of photopolymerized hydrogel 
materials; hand-held gene transfer particle gun, as described in U.S. Patent No. 5,149,655; 
ionizing radiation as described in U.S. Patent No. 5,206,152 and in W092/1 1033; nucleic 
charge neutralization or fusion with cell membranes. Additional approaches are described in 
Philip, MoL Cell Biol (1994) 74:241 1, and in Woffendin, Proc. Natl Acad Set (1994) 
97:1581. 

Naked DNA can also be employed. Exemplary naked DNA introduction methods are 
described in WO 90/1 1092 and U.S. Patent No. 5,580,859. Uptake efficiency can be improved 
using biodegradable latex beads. DNA coated latex beads are efficiently transported into cells 
after endocytosis initiation by the beads. The method can be improved further by treatment of 
the beads to increase hydrophobicity and thereby facilitate disruption of the endosome and 
release of the DNA into the cytoplasm. Liposomes that can act as gene delivery vehicles are 
described in U.S. Patent No. 5,422,120; WO 95/13796; WO 94/23697; WO 91/14445; and 
EP 0524968. 

Further non- viral delivery suitable for use includes mechanical delivery systems such as 
the approach described in Woffendin et aL, Proc. Natl Acad. ScL USA (1994) P7(24):l 1581. 
Moreover, the coding sequence and the product of expression of such can be delivered through 
deposition of photopolymerized hydrogel materials. Other conventional methods for gene 
delivery that can be used for delivery of the coding sequence include, for example, use of hand- 
held gene transfer particle gun, as described in U.S. Patent No. 5,149,655; use of ionizing 
radiation for activating transferred gene, as described in U.S. Patent No. 5,206,152 and 
WO 92/11033. 

The present invention will now be illustrated by reference to the following examples 
which set forth particularly advantageous embodiments. However, it should be noted that these 
embodiments are illustrative and are not to be construed as restricting the invention in any way. 



84 



EXAMPLES 

The present invention is now illustrated by reference to the following examples which 
set forth particularly advantageous embodiments. However, these embodiments are illustrative 
and are not meant to be construed as restricting the invention in any way. 

5 

Example 1 : Source of Biological Materials and Overview of Novel Polynucleotides 

Expressed by the Biological Materials 
Human colon cancer cell line Kml2L4-A (Morika, W. A. K. et al, Cancer Research 
(1 988) 48:6863) was used to construct a cDNA library from mRNA isolated from the cells. As 

K 10 described in the above overview, a total of 4,693 sequences expressed by the Kml2L4-A cell 

CI 

f I line were isolated and analyzed; most sequences were about 275-300 nucleotides in length. The 

!jj KM12L4-A cell line is derived from the KM12C cell line. The KM12C cell line, which is 

m • 

iff poorly metastatic (low metastatic) was established in culture from a Dukes' stage B2 surgical 

$ ... u- 

3-3 ;i 

i|| specimen (Morikawa et al. Cancer Res. (1988) 48:6863). The KML4-A is a highly metastatic 

L 15 subline derived from KM12C (Yeatman et al. Nucl. Acids. Res. (1995) 25:4007; Bao-Ling et al 

Pi! Proc. Anna. Meet. Am. Assoc. Cancer. Res. (1995) 27:3269). The KM12C and KM12C-derived 

m cell lines (e.g., KM12L4, KM12L4-A, etc.) are well-recognized in the art as a model cell line 

y for the study of colon cancer (see, e.g., Moriakawa et al, supra; Radinsky et al Clin. Cancer 

ft s 

Res. (1995) 7:19; Yeatman et al, (1995) supra; Yeatman et al. Clin. Exp. Metastasis (1996) 
20 14:246). 

The sequences were first masked to eliminate low complexity sequences using the XBLAST 
masking program (Claverie "Effective Large-Scale Sequence Similarity Searches/' In: 
Computer Methods for Macromolecular Sequence Analysis. Doolittle, ed., Meth. Enzymol 
266:212-227 Academic Press, NY, NY (1996); see particularly Claverie, in "Automated DNA 

25 Sequencing and Analysis Techniques" Adams et al, eds., Chap. 36, p. 267 Academic Press, San 
Diego, 1994 and Claverie et al Comput. Chem. (1993) 17:191 ). Generally, masking does not 
influence the final search results, except to eliminate of relative little interest due to their lox 
complexity, and to eliminate multiple "hits" based on similarity to repetitive regions common to 
multiple sequences, e.g., Alu repeats. Masking resulted in the elimination of 43 sequences. The 

30 remaining sequences were then used in a BLASTN vs. Genbank search with search parameters 
of greater than 70% overlap, 99% identity, and a p value of less than 1 x 1 0" 40 , which search 
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resulted in the discarding of 1,432 sequences. Sequences from this search also were discarded if 
the inclusive parameters were met, but the sequence was ribosomal or vector-derived. 

The resulting sequences from the previous search were classified into three groups (1,2 
and 3 below) and searched in a BLASTX vs. NRP (non-redundant proteins) database search: 
(1) unknown (no hits in the Genbank search), (2) weak similarity (greater than 45% identity and 
p value of less than 1x10"), and (3) high similarity (greater than 60% overlap, greater than 
80% identity, and p value less than 1x10"). This search resulted in discard of 98 sequences as 
having greater than 70% overlap, greater than 99% identity, and p value of less than 1 x 1 0" 40 . 

The remaining sequences were classified as unknown (no hits), weak similarity, and 
high similarity (parameters as above). Two searches were performed on these sequences. First, 
a BLAST vs. EST database search resulted in discard of 1771 sequences (sequences with greater 
than 99% overlap, greater than 99% similarity and a p value of less than 1 x 10" 40 ; sequences 
with a p value of less than 1x10" when compared to a database sequence of human origin 
were also excluded). Second, a BLASTN vs. Patent GeneSeq database resulted in discard of 1 5 
sequences (greater than 99% identity; p value less than 1 x 1 0" 40 ; greater than 99% overlap). 

The remaining sequences were subjected to screening using other rules and redundancies 
in the dataset Sequences with a p value of less than 1 x 10 ~ U1 in relation to a database 
sequence of human origin were specifically excluded. The final result provided the 404 
sequences listed in the accompanying Sequence Listing. The Sequence Listing is arranged 
beginning with sequences with no similarity to any sequence in a database searched, and ending 
with sequences with the greatest similarity. Each identified polynucleotide represents sequence 
from at least a partial mRNA transcript. Polynucleotides that were determined to be novel were 
assigned a sequence identification number. 

The novel polynucleotides and were assigned sequence identification numbers SEQ ID 
NOS: 1-404. The DNA sequences corresponding to the novel polynucleotides are provided in 
the Sequence Listing. The majority of the sequences are presented in the Sequence Listing in 
the 5' to 3' direction. A small number, 25, are listed in the Sequence Listing in the 5' to 3' 
direction but the sequence as written is actually 3' to 5'. These sequences are readily identified 
with the designation "AR" in the Sequence Name in Table 1 (inserted before the claims). The 
sequences correctly listed in the 5' to 3' direction in the Sequence Listing are designated "AF." 
The Sequence Listing filed herewith therefore contains 25 sequences listed in the reverse order, 
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namely SEQ IDNOS:47, 97, 137, 171, 173, 179, 182, 194, 200, 202, 213, 227, 258, 264, 275, 
302, 313, 324, 329, 330, 331, 338, 358, 379, and 404. 

Because the provided polynucleotides represent partial mRNA transcripts, two or more 
polynucleotides of the invention may represent different regions of the same mRNA transcript 
5 and the same gene. Thus, if two or more SEQ ID NOS: are identified as belonging to the same 
clone, then either sequence can be used to obtain the full-length mRNA or gene. 

In order to confirm the sequences of SEQ ID NOS: 1 -404, inserts of the clones 
corresponding to these polynucleotides were re-sequenced. These "validation" sequences are 
provided in SEQ ID NOS:405-800. These validation sequences were often longer than the 
y k 1 0 original polynucleotide sequences. They validate, and thus often provide additional sequence 
information. Validation sequences can be correlated with the original sequences they validate 
Sf by identifying those sequences of SEQ ID NOS: 1-404 and the validation sequences of SEQ ID 
ifi NOS:405-800 that share the same clone name in Table 1. 

i a:)j 

s 5 s 
•sir 

I S j; 

* 15 Example 2: Results of Public Database Search to Identify Function of Gene Products 
fjj SEQ ID NOS:1-404, as well as the validation sequences SEQ ID NOS:405-800, were 

j 5 ;* translated in all three reading frames to determine the best alignment with the individual 
Q sequences. These amino acid sequences and nucleotide sequences are referred, generally, as 

■ query sequences, which are aligned with the individual sequences. Query and individual 

20 sequences were aligned using the BLAST programs, available over the world wide web at 

http://ww.ncbi.nlm.nih.gov/BLASTA Again the sequences were masked to various extents to 
prevent searching of repetitive sequences or poly-A sequences, using the XBLAST program for 
masking low complexity as described above in Example 1 . 

Table 2 (inserted before the claims) shows the results of the alignments. Table 2 refers 
25 to each sequence by its SEQ ID NO:, the accession numbers and descriptions of nearest 
neighbors from the Genbank and Non-Redundant Protein searches, and the p values of the 
search results. Table 1 identifies each SEQ ID NO: by SEQ name, clone ID, and cluster. As 
discussed above, a single cluster includes polynucleotides representing the same gene or gene 
family, and generally represents sequences encoding the same gene product. 
30 For each of SEQ ID NOS: 1 -800, the best alignment to a protein or DNA sequence is 

included in Table 2. The activity of the polypeptide encoded by SEQ ID NOS: 1-800 is the same 
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or similar to the nearest neighbor reported in Table 2. The accession number of the nearest 
neighbor is reported, providing a reference to the activities exhibited by the nearest neighbor. 
The search program and database used for the alignment also are indicated as well as a 
calculation of the p value. 
5 Full length sequences or fragments of the polynucleotide sequences of the nearest 

neighbors can be used as probes and primers to identify and isolate the full length sequence of 
SEQ ID NOS: 1-800. The nearest neighbors can indicate a tissue or cell type to be used to 
construct a library for the full-length sequences of SEQ ID NOS: 1 -800. 

SEQ ID NOS : 1 -800 and the translations thereof may be human homologs of known 
M 10 genes of other species or novel allelic variants of known human genes. In such cases, these new 
ft human sequences are suitable as diagnostics or therapeutics. As diagnostics, the human 
"jl sequences SEQ ID NOS: 1 -800 exhibit greater specificity in detecting and differentiating human 

£:}"* 

||| cell lines and types than homologs of other species. The human polypeptides encoded by SEQ 

ill 

ill ID NOS: 1-800 are likely to be less immunogenic when administered to humans than homologs 

^ 15 from other species. Further, on administration to humans, the polypeptides encoded by SEQ ID 

5 Ji 

f|J NOS: 1-800 can show greater specificity or can be better regulated by other human proteins than 

tZ are homologs from other species. 

tfl ■ 
ft 

Example 3 : Members of Protein Families 

20 After conducting a profile search as described in the specification above, several of the 

polynucleotides of the invention were found to encode polypeptides having characteristics of a 
polypeptide belonging to a known protein families (and thus represent new members of these 
protein families) and/or comprising a known functional domain (Table 3). Thus the invention 
encompasses fragments, fusions, and variants of such polynucleotides that retain biological 

25 activity associated with the protein family and/or functional domain identified herein. 

Table 3 Polynucleotides encoding gene products of a protein family or having a known 



functional domain(s). 



SEQID 
NO: 


Biological Activity (Profile hit) 


Start 


Stop 


Dir 


24 


4 transmembrane segments integral membrane proteins 


1218 


578 


rev 


41 


4 transmembrane segments integral membrane proteins 


1086 


413 


rev 


101 


4 transmembrane segments integral membrane proteins 


1206 


544 


rev 


157 


4 transmembrane segments integral membrane proteins 


721 


33 


rev 
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Table 3 Polynucleotides encoding gene products of a protein family or having a known 



functional domain(s). 





Rin I n frivol Affivitv /"PrrifilA liit^ 
JDlOlOglldl /\CllVIiy ^XIUIIIC WW) 


Start 
Ota I I 


Stnn 

kJlUU 




J41 


A i~rr*r\ cm /^T"Y1 r">fO Y\ C&CTTY\£*ntC 1 v» tf^CTVQ 1 m arvi KfOnA mOTfVkBltC 

t- uanbulcniurallC oCgUIClilo lillvgld.1 IIlCHlUldllC prULCIIlo 


1 jLJJ 




1 cv 




4 transrnern urane segments integral rnemDrdne proteins 




in 


iur 


1GS~ 


4 transrnern crane segments integral meinurane proteins 




17 
1 / 


ior 


lGs~ 


4 transmem Gran e segments integral memnrane proteins 


*f / 1 


jy 


rev 


S\ A 

24 


7 transmembrane receptor (Secretin tamiiy) 


1 OA1 

1^01 


An 1 

491 


rev 


41 


7 transmembrane receptor (Secretin tamiiy) 


1 "? API 

1309 




rev 


i A1 

101 


7 transmembrane receptor (Secretin family) 


1330 


296 


rev 


157 


7 transmembrane receptor (Secretin family) 


1 1 HI 

11/3 


z49 


rev 


001 
Z91 


/ transmem crane receptor secretin iamiiy ) 






rev 


z9i 


l transmem orane receptor ^oecretin iamiiyj 


/ lz 


1 10 
1 


ior 


one 


/ transmem Dra.ne receptor ^oecreun idiuiiy^ 




>1 


ior 


ins; 


7 tron orn^tnlircifiA i"^f*Arvtni* ^ ^p^tvtiti "PatniliA 
/ trdllMIlCIllUI dllC ICCCptUi ^oCvi Cllll lalllliy^ 


7S1 




1CV 


1 1 S" 


7 tran cm AtYihrnnP 1 rf^pprvtnr f^pf^rptiti "Rarnilv^ 

/ tIdlloIllCIIIUIa.llC ICuCpLUI ^kjGWIGllll LCLlllliy J 


10SR 

1 UJO 


970 


1CV 




7 trancm AtnVwcjfiA rpppntAt* /^f^ff^tin fiimiili/^ 
/ tl alibi llvlll Ul CtllC ICC'CptUl ^OC^ICLIU Lalllliy J 






lev 


1 16 


Ank repeat 


1/11 
141 


01c 
21o 


tor 


251 


Ank repeat 


oon 

zyu 


ZU/ 


for 


oc i 
25 1 


Ank repeat 


4o/ 


3o / 


tor 


63 


ATPases Associated with Various Cellular Activities 


543 


60 


for 


1 16 


ATPases Associated with Various Cellular Activities 


802 


313 


for 


134 


A 1 rases Associateo witn various v^enuiar Activities 






rev 


136 


ATPases Associated with Various Cellular Activities 


712 


163 


for 


1 s;i 


/\irdses /\bsoL/idteu witn vdnuuis v^eiiuidr /\ctivitie!s 


710 


7^ 


lUl 


1 ^1 


AnTT^sic^c A «2cr\f*i jiI'pH wi+Vi A/jirifMic r^P»llnlcir Af^fivJti^c 

CIjCj A 0 3 \.) *w 1 LC LJ. Willi V Ol V^CllUlul rvcuvuiCj 


J OV_J 


11 


fftr 

xKJl 


J OH 


ATPqcpc A ccrtpiiitpri with \/jiriniic f^plhiljir Af*tivitipc 




140 


for 


404 


ATPa<;e<; Associated with Various Cellular Activities 


704 


52 


for 


374 


Rasic region nlus leucine zinner transcription factors 


298 


146 


for 

Ivl 


97 


RrnmoHomain ^conserved senuence found in human 

JJ1 UlilVUUlUCllll y vUllJvl V wU JvUUvllvv L\J Li 11 VI ill lillillClll^ 

Drosophila and yeast proteins.) 


230 


63 


for 


136 


EF-hand 


121 


207 


for 


242 


EF-hand 


238 


155 


for 


379 


EF-hand 


212 


126 


for 


308 


Eukarvotic asoartvl proteases 


1300 


461 


rev 


213 


OATA familv of transcrintion factors 


720 


211 


for 


1A7 
JO / 


vj-prt>tvin dipild otluUIllt 


071 
y 1 1 


467 


ICY 


188 


rnorbol esters/aiacyigiyceroi Dinaing 


y 1 


1 HI 
ill 


for 


0< 1 

2 j 1 


rnorooi esters/aiacy lgiyceroi Dinaing 


1 11 
1 jj 


9 1 0 
Z1V 


ior 


202 


protein kinase 


4oz 


1 


rev 


909 
zuz 


prUlCUl JVlllaoC 


Q70 
y 1 u 


1 
1 


ICY 


315 


protein kinase 


739 


158 


for 


315 


protein kinase 


1023 


197 


for 


367 


protein kinase 


1046 


285 


rev 


397 


protein kinase 


511 


6 


for 


256 


Protein phosphatase 2C 


13 


90 


for 


256 


Protein phosphatase 2C 


163 


86 


for 
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Table 3 Polynucleotides encoding gene products of a protein family or having a known 



functional domain(s). 



SEO ID 
NO: 


Biological Activity (Profile hit) 


Start 


StOD 


Dir 


382 


Protein Tvrosine Phosnhatase 


261 


2 


for 

A. V 1 


JvO 


CU^ Domain 


141 




1U1 




^sHI Domain 

OIlj J_yUUlu.HI 


3S9 


209 


■for 


1 £o 

loy 


l rypsm 


/04 


1 £/l 
lOH- 


rev 


-ion 

188 


WD domain, G-beta repeats 


/i OA 

480 


382 


tor 


188 


WD domain, G-beta repeats 


206 


1 17 


tor 


335 


WD domain, G-beta repeats 


3 


A^ 

92 


for 


23 


wnt family of developmental signaling proteins 


1151 


335 


rev 


29 1 


wnt family of developmental signaling proteins 


779 


OA 

89 


rev 


29 1 


wnt family of developmental signaling proteins 


1347 


382 


rev 


324 


wnt family of developmental signaling proteins 


1180 


499 


rev 


330 


wnt family of developmental signaling proteins 


I 1 OA 

II 80 


/i A A 

499 


rev 


A 1 

341 


wnt tamily ot developmental signaling proteins 


1399 


560 


rev 


353 


wnt tamily ot developmental signaling proteins 


OOA 

ooO 


49 


rev 


188 


WW/rsp5/WWP domain containing proteins 


43 1 


354 


tor 


ota 

379 


WW/rsp5/ W Wr domain containing proteins 


Iz 


on 
oy 


tor 
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Zinc finger, C2H2 type 


254 


192 
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306 


Zinc finger, C2H2 type 


428 


367 


for 


386 


Zinc finger, C2H2 type 


191 


253 


for 


322 


Zinc finger, CCHC class 


553 


503 


for 


306 


Zinc-binding metal loprotease domain 


101 


60 


rev 


395 


Zinc-binding metalloprotease domain 


28 


69 


rev 



Start and stop indicate the position within the individual sequenes that align with the 
query sequence having the indicated SEQ ID NO. The direction (Dir) indicates the orientation 
of the query sequence with respect to the individual sequence, where forward (for) indicates that 
the alignment is in the same direction (left to right) as the sequence provided in the Sequence 
Listing and reverse (rev) indicates that the alignment is with a sequence complementary to the 
sequence provided in the Sequence Listing. 

Some polynucleotides exhibited multiple profile hits because, for example, the particular 
sequence contains overlapping profile regions, and/or the sequence contains two different 
functional domains. These profile hits are described in more detail below. 

a) Four Transmembrane Integral Membrane Proteins. SEQ ID NOS: 24, 41, 101, 157, 
341, and 395 correspond to a sequence encoding a polypeptide that is a member of the 4 
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transmembrane segments integral membrane protein family (transmembrane 4 family). The 
transmembrane 4 family of proteins includes a number of evolutionarily-related eukaryotic cell 
surface antigens (Levy et al 9 J. Biol Chem., (1991) 266:14591; Tomlinson et al 9 Eur. J. 
Immunol (1993) 23:136; Barclay et al The leucocyte antigen factbooks. (1993) Academic 
Press, London/San Diego). The proteins belonging to this family include: 1) Mammalian 
antigen CD9 (MIC3), which is involved in platelet activation and aggregation; 2) Mammalian 
leukocyte antigen CD37, expressed on B lymphocytes; 3) Mammalian leukocyte antigen CD53 
(OX-44), which is implicated in growth regulation in hematopoietic cells; 4) Mammalian 
lysosomal membrane protein CD63 (melanoma-associated antigen ME491; antigen AD1); 5) 
Mammalian antigen CD81 (cell surface protein TAPA-1), which is implicated in regulation of 
lymphoma cell growth; 6) Mammalian antigen CD82 (protein R2; antigen C33; Kangai 1 
(KAI1)), which associates with CD4 or CD8 and delivers costimulatory signals for the 
TCR/CD3 pathway; 7) Mammalian antigen CD151 (SFA-1; platelet-endothelial tetraspan 
antigen 3 (PETA-3)); 8) Mammalian cell surface glycoprotein A15 (TALLA-1; MXS1); 
9) Mammalian novel antigen 2 (NAG-2); 10) Human tumor-associated antigen CO-029; 11) 
Schistosoma mansoni and japonicum 23 Kd surface antigen (SM23 / SJ23). 

The members of the 4 transmembrane family share several characteristics. First, they all 
are apparently type III membrane proteins, which are integral membrane proteins containing an 
N-terminal membrane-anchoring domain which is not cleaved during biosynthesis and which 
functions both as a translocation signal and as a membrane anchor. The family members also 
contain three additional transmembrane regions, at least seven conserved cysteines residues, and 
are of approximately the same size (218 to 284 residues). These proteins are collectively know 
as the "transmembrane 4 superfamily" (TM4) because they span plasma membrane four times. 
A schematic diagram of the domain structure of these proteins is as follows: 

1 1 TMa | Extra | TM2| Cyt | TM3 | Extracellular | TM4 | Cyt| 

-K+ + + — c — c — + cc— c — c — + — c — + 

*K ^0 V0 

wR *^ *^ 

where Cyt is the cytoplasmic domain, TMa is the transmembrane anchor; TM2 to TM4 
represents transmembrane regions 2 to 4, 'C are conserved cysteines, and ? * 'indicates the 
position of the consensus pattern. The consensus pattern spans a conserved region including 
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two cysteines located in a short cytoplasmic loop between two transmembrane domains: 
Consensus pattern: G-x(3)-[LIVMF]-x(2)-[GSA]-[LIVMF](2)-G-C-x-[GA]-[STA]- x(2)-[EG]- 
x(2)-[CWN]-[LIVM](2). 

b) Seven Transmembrane Integral Membrane Proteins. SEQ ID NOS: 24, 41, 101, 157, 
291, 305, 3 15, and 341 correspond to a sequence encoding a polypeptide that is a member of the 
seven transmembrane receptor family. G-protein coupled receptors (Strosberg, Eur. J, Biochem. 
(1991) 196:1; Kerlavage, Curr. Opin. Struct Biol (1991) 7:394; and Probst et al , DNA Cell 
Biol (1992) 11:1; and Savarese et al, Biochem. J. (1992) 293:1) (also called R7G) are an 
extensive group of hormones, neurotransmitters, odorants and light receptors which transduce 
extracellular signals by interaction with guanine nucleotide-binding (G) proteins. The tertiary 
structure of these receptors is thought to be highly similar. They have seven hydrophobic 
regions, each of which most probably spans the membrane. The N-terminus is located on the 
extracellular side of the membrane and is often glycosylated, while the C-terminus is 
cytoplasmic and generally phosphorylated. Three extracellular loops alternate with three 
intracellular loops to link the seven transmembrane regions. Most, but not all of these receptors, 
lack a signal peptide. The most conserved parts of these proteins are the transmembrane regions 
and the first two cytoplasmic loops. A conserved acidic-Arg-aromatic triplet is present in the N- 
terminal extremity of the second cytoplasmic loop (Attwood et al, Gene (1991) 98:153) and 
could be implicated in the interaction with G proteins. 

To detect this widespread family of proteins a pattern is used that contains the conserved 
triplet and that also spans the major part of the third transmembrane helix. Additional 
information about the seven transmembrane receptor family, and methods for their identification 
and use, is found in U.S. Patent No. 5,759,804. Due in part to their expression on the cell 
surface and other attractive characteristics, seven transmembrane protein family members are of 
particular interest as drug targets, as surface antigen markers, and as drug delivery targets (e.g., 
using antibody-drug complexes and/or use of anti-seven transmembrane protein antibodies as 
therapeutics in their own right). 

c) Ank Repeats. SEQ ID NOS: 1 16 and 25 1 represent polynucleotides encoding Ank 
repeat-containing proteins. The ankyrin motif is a 33 amino acid sequence named after the 
protein ankyrin which has 24 tandem 33-amino-acid motifs. Ank repeats were originally 
identified in the cell-cycle-control protein cdclO (Breeden et al, Nature (1987) 329:651). 
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Proteins containing ankyrin repeats include ankyrin, myotropin, I-kappaB proteins, cell cycle 
protein cdclO, the Notch receptor (Matsuno et al, Development (1997) 124(21) :4265); G9a (or 
B AT8) of the class III region of the major histocompatibility complex (Biochem J. 290:8 1 1-818, 
1993), FABP, GABP, 53BP2, Linl2, glp-1, SW14, and SW16. The functions of the ankyrin 
5 repeats are compatible with a role in protein-protein interactions (Bork, Proteins (1993) 

17(4):363; Lambert and Bennet, Eur. J. Biochem. (1993) 211:1; Kerr et al, Current Op. Cell 
Biol (1992) 4:496; Bennet et al, 1 Biol Chem. (1980) 255:6424). 

The 90 kD N-terminal domain of ankyrin contains a series of 24 33-amino-acid ank 
repeats. (Lux et al, Nature (1990) 344:36-42, Lambert et al, PNAS USA (1990) 57:1730.) 
% 10 The 24 ank repeats form four folded subdomains of 6 repeats each. These four repeat 

0 subdomains mediate interactions with at least 7 different families of membrane proteins. 

{|1 Ankyrin contains two separate binding sites for anion exchanger dimers. One site utilizes repeat 

1 ft 

m subdomain two (repeats 7-12) and the other requires both repeat subdomains 3 and 4 (repeats 
m 13-24). Since the anion exchangers exist in dimers, ankyrin binds 4 anion exchangers at the 
Q 15 same time. (Michaely and Bennett, J. Biol. Chem. (1995) 2 70(3 7). -22050) The repeat motifs 
y, are involved in ankyrin interaction with tubulin, spectrin, and other membrane proteins. (Lux et 

p al, Nature (1990) 344:36) 

m The Rel/NF-kappaB/Dorsal family of transcription factors have activity that is controlled 

by sequestration in the cytoplasm in association with inhibitory proteins referred to as I-kappaB. 

20 (Gilmore, Cell (1990) 52:841; Nolan and Baltimore, Curr Opin Genet Dev. (1992) 2:21 1 ; 
Baeuerle, Biochim Biophys Acta (1991) 7072:63; Schmitz et al, Trends Cell Biol (1991) 
7:130.) I-kappaB proteins contain 5 to 8 copies of 33 amino acid ankyrin repeats and certain 
NF-kappaB/rel proteins are also regulated by cis-acting ankyrin repeat containing domains 
including pl05NF-kappaB which contains a series of ankyrin repeats (Diehl and Hannink, J. 

25 Virol (1993) 67(1 2) ;7 161). The I-kappaBs and Cactus (also containing ankyrin repeats) inhibit 
activators through differential interactions with the Rel-homology domain. The gene family 
includes proto-oncogenes, thus broadly implicating I-kappaB in the control of both normal gene 
expression and the aberrant gene expression that makes cells cancerous. (Nolan and Baltimore, 
Curr Opin Genet Dev. (1992) 2(2) :2\ 1-220). In the case of rel/NF-kappaB and pp40/I- 

30 kappaBp, both the ankyrin repeats and the carboxy-terminal domain are required for inhibiting 

DNA-binding activity and direct association of pp40/I-kappaBp with rel/NF-kappaB protein. 
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The ankyrin repeats and the carboxy-terminal of pp40/I-kappaB(3 ( form a structure that 
associates with the rel homology domain to inhibit DNA binding activity (Inoue et al, PNAS 

USA (1992) SP:4333). 

The 4 ankyrin repeats in the amino terminus of the transcription factor subunit GABPp 
are required for its interaction with the GABPa subunit to form a functional high affinity DNA- 
binding protein. These repeats can be crosslinked to DNA when GABP is bound to its target 
sequence. (Thompson et al, Science (1991) 253:162; LaMarco et al, Science (1991) 255:789). 

Myotrophin, a 12.5 kDa protein having a key role in the initiation of cardiac 

hypertrophy, comprises ankyrin repeats. The ankyrin repeats are characteristic of a hairpin- like 

protruding tip followed by a helix-turn-helix motif The V-shaped helix-turn-helix of the 
repeats stack sequentially in bundles and are stabilized by compact hydrophobic cores, whereas 
the protruding tips are less ordered. 

d) ATPases Associated with Various Cellular Activities fAAA). SEQ ID NOS: 63, 1 16, 
134, 136, 151, 384, and 404 polynucleotides encoding novel members of the "ATPases 
Associated with diverse cellular Activities" (AAA) protein family The AAA protein family is 
composed of a large number of ATPases that share a conserved region of about 220 amino acids 
that contains an ATP-binding site (Froehlich et al } J. Cell Biol. (1991) 774:443; Erdmann et al 
Cell (1991) 64:499; Peters et al, EMBOJ. (1990) P:1757; Kunau et al, Biochimie (1993) 
7J:209-224; Confalonieri et al, BioEssays (1995) 77:639; http://yeamob.pci.chemie.uni- 
tuebingen.de/AAA/Description.html). The proteins that belong to this family either contain one 
or two AAA domains. 

Proteins containing two AAA domains include: 1) Mammalian and drosophila NSF (N- 

ethylmaleimide-sensitive fusion protein) and the fungal homolog, SEC 18, which are involved in 

intracellular transport between the endoplasmic reticulum and Golgi, as well as between 

different Golgi cisternae; 2) Mammalian transitional endoplasmic reticulum ATPase (previously 

known as p97 or VCP), which is involved in the transfer of membranes from the endoplasmic 

reticulum to the golgi apparatus. This ATPase forms a ring-shaped homooligomer composed of 

six subunits. The yeast homolog, CDC48, plays a role in spindle pole proliferation; 3) Yeast 

protein PAS1 essential for peroxisome assembly and the related protein PAS1 from Pichia 

pastoris; 4) Yeast protein AFG2; 5) Sulfolobus acidocaldarius protein SAV and Halobacterium 

salinarium cdcH, which may be part of a transduction pathway connecting light to cell division. 
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Proteins containing a single AAA domain include: 1) Escherichia coli and other bacteria 
ftsH (or hflB) protein. FtsH is an ATP-dependent zinc metallopeptidase that degrades the heat- 
shock sigma-32 factor, and is an integral membrane protein with a large cytoplasmic C-terminal 
domain that contain both the AAA and the protease domains; 2) Yeast protein YME1 , a protein 
5 important for maintaining the integrity of the mitochondrial compartment. YME1 is also a zinc- 
dependent protease; 3) Yeast protein AFG3 (or YTA10). This protein also contains an AAA 
domain followed by a zinc-dependent protease domain; 4) Subunits from regulatory complex of 
the 26S proteasome (Hilt et al, Trends Biochem. Set (1996) 27:96), which is involved in the 
ATP-dependent degradation of ubiquitinated proteins, which subunits include: a) Mammalian 4 

h* 10 and homologs in other higher eukaryotes, in yeast (gene YTA5) and fission yeast (gene mts2); 

p{ b) Mammalian 6 (TBP7) and homologs in other higher eukaryotes and in yeast (gene YTA2); c) 

^ Mammalian subunit 7 (MS SI) and homologs in other higher eukaryotes and in yeast (gene 

ffl 

• iff CIM5 or YTA3); d) Mammalian subunit 8 (P45) and homologs in other higher eukaryotes and 

m in yeast (SUG1 or CIM3 or TBY1) and fission yeast (gene letl); e) Other probable subunits 

!. 1 5 include human TBP 1 , which influences HIV gene expression by interacting with the virus tat 

ffl transactivator protein, and yeast YTA1 and YTA6; 5) Yeast protein BCS 1 , a mitochondrial 

f I J protein essential for the expression of the Rieske iron-sulfur protein; 6) Yeast protein MSP 1 , a 

1 11 *: 

CI protein involved in intramitochondrial sorting of proteins; 7) Yeast protein PAS 8, and the 

ffl 

corresponding proteins PAS 5 from Pichia pastoris and PAY4 from Yarrowia lipolytica; 8) 
20 Mouse protein SKD1 and its fission yeast homolog (SpAC2Gl 1.06); 9) Caenorhabditis elegans 
mei otic spindle formation protein mei-1; 10) Yeast protein SAP 1' 11) Yeast protein YTA7; and 
12) Mycobacterium leprae hypothetical protein A2126A. 

In general, the AAA domains in these proteins act as ATP-dependent protein 
clamps(Confalonieri et al (1 995) BioEssays 1 7:639). In addition to the ATP-binding 'A' and f B' 
25 motifs, which are located in the N-terminal half of this domain, there is a highly conserved 
region located in the central part of the domain which was used in the development of the 
signature pattern. The consensus pattern is: [LIVMT]-x-[LIVMT]-[LIVMF]-x-[GATMC]-[ST]- 
[NS]-x(4)-[LIVM]- D-x-A-[LIFA]-x-R. 

e) Basic Region Plus Leucine Zipper Transcription Factors. SEQ ID NO:374 correspond 
30 to a polynucleotide encoding a novel member of the family of basic region plus leucine zipper 
transcription factors. The bZIP superfamily (Hurst, Protein Prof. (1995) 2:105; and 
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Ellenberger, Curr. Opin. Struct BioL (1994) 4:12) of eukaryotic DNA-binding transcription 
factors encompasses proteins that contain a basic region mediating sequence-specific DNA- 
binding followed by a leucine zipper required for dimerization. Members of the family include 
transcription factor AP-1, which binds selectively to enhancer elements in the cis control 
regions of SV40 and metallothionein II A. AP-1, also known as c-jun, is the cellular homolog of 
the avian sarcoma virus 17 (ASV17) oncogene v-jun. 

Other members of this protein family include jun-B and jun-D, probable transcription 
factors that are highly similar to jun/AP-1 ; the fos protein, a proto-oncogene that forms a non- 
covalent dimer with c-jun; the fos-related proteins fra-1, and fos B; and mammalian cAMP 
response element (CRE) binding proteins CREB, CREM, ATF-1, ATF-3, ATF-4, ATF-5, 
ATF-6 and LRF-1 . The consensus pattern for this protein family is: [KR]-x(l ? 3)-[RKSAQ]-N- 
x(2)-[SAQ](2)-x-[RKTAENQ]-x-R-x-[RK]. 

f) Bromodomain. SEQ ID NO: 97 corresponds to a polynucleotide encoding a 
polypeptide having a bromodomain region (Haynes et al., 1992, Nucleic Acids Res. 20:2693- 
2603, Tamkun et al., 1992, Cell 68:561-572, and Tamkun, 1995, Curr. Opin. Genet. Dev. 5:473- 
477), which is a conserved region of about 70 amino acids found in the following proteins: 

1) Higher eukaryotes transcription initiation factor TFIID 250 Kd subunit (TBP-associated 
factor p250) (gene CCG1); P250 is associated with the TFIID TATA-box binding protein and 
seems essential for progression of the Gl phase of the cell cycle. 2) Human RING3, a protein 
of unknown function encoded in the MHC class II locus; 3) Mammalian CREB-binding protein 
(CBP), which mediates c AMP-gene regulation by binding specifically to phosphorylated CREB 
protein; 4) Mammalian homologs of brahma, including three brahma-like human: 
SNF2a(hBRM), SNF2b, and BRG1 ; 5) Human BS69, a protein that binds to adenovirus El A 
and inhibits El A transactivation; 6) Human peregrin (or Br 140). 

The bromodomain is thought to be involved in protein-protein interactions and may be 
important for the assembly or activity of multicomponent complexes involved in transcriptional 
activation. The consensus pattern, which spans a major part of the bromodomain, is: 
[STANVF]-x(2)-F-x(4)-[DNS]-x(5,7)-[DENQTF]-Y-[HFY]-x(2)- [LIVMFY]-x(3)-[LIVM]- 
x(4)-[LIVM]-x(6,8)-Y-x( 12,1 3)-[LIVM]-x(2)-N-[SACF]-x(2)-[FY] . 

g) EF-Hand. SEQ ID NOS:136, 242, and 379 correspond to polynucleotides encoding a 
novel protein in the family of EF-hand proteins. Many calcium-binding proteins belong to the 
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same evolutionary family and share a type of calcium-binding domain known as the EF-hand 
(Kawasaki et aL, Protein. Prof. (1995) 2:305-490). This type of domain consists of a twelve 
residue loop flanked on both sides by a twelve residue alpha-helical domain. In an EF-hand 
loop the calcium ion is coordinated in a pentagonal bipyramidal configuration. The six residues 
involved in the binding are in positions 1, 3, 5 , 7, 9 and 12; these residues are denoted by X, Y, 
Z, -Y, -X and -Z. The invariant Glu or Asp at position 12 provides two oxygens for liganding 
Ca (bidentate ligand). 

Proteins known to contain EF-hand regions include: Calmodulin (Ca=4, except in yeast 
where Ca=3) ("Ca=" indicates approximate number of EF-hand regions); diacylglycerol kinase 
(EC 2.7.1.107) (DGK) (Ca=2); 2) FAD-dependent glycerol-3 -phosphate dehydrogenase (EC 
1.1.99.5) from mammals (Ca=l); guanylate cyclase activating protein (GCAP) (Ca=3); MIF 
related proteins 8 (MRP-8 or CFAG) and 14 (MRP- 14) (Ca=2); myosin regulatory light chains 
(Ca=l); oncomodulin (Ca=2); osteonectin (basement membrane protein BM-40) (SPARC); and 
proteins that contain an "osteonectin" domain (QR1, matrix glycoprotein SCI). 

The consensus pattern includes the complete EF-hand loop as well as the first residue 

tr 

which follows the loop and which seem to always be hydrophobic. 

Consensus pattern: D-x-[DNS]-{ILVFYW}-[DENSTG]-[DNQGHRK]-{GP}- 
[LIVMC]-pENQSTAGC]-x(2)-[DE]-[LIVMFYW] 

h) Eukarvotic Aspartvl Proteases. SEQ ID NO:308 corresponds to a gene encoding a 
novel eukaryotic aspartyl protease. Aspartyl proteases, known as acid proteases, (EC 3.4.23.-) 
are a widely distributed family of proteolytic enzymes (Foltmann B., Essays Biochem. (1981) 
77:52; Davies D.R., Annu. Rev. Biophys. Chem. (1990) 79:189; Rao IK.M., et al, Biochemistry 
(1991) 50:4663) known to exist in vertebrates, fungi, plants, retroviruses and some plant viruses. 
Aspartate proteases of eukaryotes are monomeric enzymes which consist of two domains. Each 
domain contains an active site centered on a catalytic aspartyl residue. The two domains most 
probably evolved from the duplication of an ancestral gene encoding a primordial domain. 
Currently known eukaryotic aspartyl proteases include: 1) Vertebrate gastric pepsins A and C 
(also known as gastricsin); 2) Vertebrate chymosin (rennin), involved in digestion and used for 
making cheese; 3) Vertebrate lysosomal cathepsins D (EC 3.4.23.5) and E (EC 3.4.23.34); 4) 
Mammalian renin (EC 3.4.23.15) whose function is to generate angiotensin I from 
angiotensinogen in the plasma; 5) Fungal proteases such as aspergillopepsin A (EC 3.4.23.18), 
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candidapepsin (EC 3.4.23.24), mucoropepsin (EC 3.4.23.23) (mucor rennin), endothiapepsin 
(EC 3.4.23.22), polyporopepsin (EC 3.4.23.29), and rhizopuspepsin (EC 3.4.23.21); and 6) 
Yeast saccharopepsin (EC 3.4.23.25) (proteinase A) (gene PEP4). PEP4 is implicated in 
posttranslational regulation of vacuolar hydrolases; 7) Yeast barrierpepsin (EC 3.4.23.35) (gene 
5 BAR1); a protease that cleaves alpha-factor and thus acts as an antagonist of the mating 

pheromone; and 8) Fission yeast sxal which is involved in degrading or processing the mating 
pheromones. 

Most retroviruses and some plant viruses, such as badnaviruses, encode for an aspartyl 
protease which is an homodimer of a chain of about 95 to 125 amino acids. In most 
jnt 10 retroviruses, the protease is encoded as a segment of a polyprotein which is cleaved during the 
Js! maturation process of the virus. It is generally part of the pol polyprotein and, more rarely, of 

*•>.«•; 

*•::♦** *t 

H the gag polyprotein. Because the sequence around the two aspartates of eukaryotic aspartyl 
§1 proteases and around the single active site of the viral proteases is conserved, a single signature 

Hi! pattern can be used to identify members of both groups of proteases. The consensus pattern is: 

Hi 

k 15 [LIVMFGAC]-[LIVMTADN]-[LIVFSA]-D-[ST|-G-[STAV]-[STAPDENQ]- x- 

m [LIVMFSTNC]-x-[LIVMFGTA], where D is the active site residue. 

H; i) GATA Family of Transcription Factors. SEQ ID NO:213 corresponds to a novel 

O member of the GATA family of transcription factors. The GATA family of transcription factors 

f l n 

are proteins that bind to DNA sites with the consensus sequence (A/T)GATA(A/G), found 
20 within the regulatory region of a number of genes. Proteins currently known to belong to this 

family are: 1 ) GATA-1 (Trainor, CD., et al , Nature (1 990) 343:92) (also known as Eryfl , GF- 
1 or NF-E1), which binds to the GATA region of globin genes and other genes expressed in 
erythroid cells. It is a transcriptional activator which probably serves as a general 'switch' factor 
for erythroid development; 2) GATA-2 (Lee, M.E., et al, J, Biol Chem. (1991) 266: 161 88), a 
25 transcriptional activator which regulates endothelin-1 gene expression in endothelial cells; 3) 
GATA-3 (Ho,I.-C, etal 9 EMBOJ. (1991) 70:1187), a transcriptional activator which binds to 
the enhancer of the T-cell receptor alpha and delta genes; 4) GATA-4 (Spieth, J., et al , Mol 
Cell Biol (1991) 11 :465 1), a transcriptional activator expressed in endodermally derived 
tissues and heart; 5) Drosophila protein pannier (or DGATAa) (gene pnr) which acts as a 
30 repressor of the achaete-scute complex (as-c); 6) Bombyx mori BCFI (Drevet, J.R., et al, J. 
Biol Chem. (1994) 269: 10660), which regulates the expression of chorion genes; 7) 
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Caenorhabditis elegans elt-1 and elt-2, transcriptional activators of genes containing the GAT A 
region, including vitellogenin genes (Hawkins, M.G., et aL, J. Biol Chem. (1995) 270:14666); 
8) Ustilago maydis urbsl (Voisard, C.P.O., et aL, Mol Cell Biol (1993) 73:7091), a protein 

involved in the repression of the biosynthesis of siderophores; 9) Fission yeast protein GAF2. 

All these transcription factors contain a pair of highly similar 'zinc finger 1 type domains 
with the consensus sequence C-x2-C-xl7-C-x2-C. Some other proteins contain a single zinc 
finger motif highly related to those of the GATA transcription factors. These proteins are: 
1) Drosophila box A-binding factor (ABF) (also known as protein serpent (gene sip)) which 
may function as a transcriptional activator protein and may play a key role in the organogenesis 
of the fat body; 2) Emericella nidulans are (Arst, H.N., Jr., et aL, Trends Genet (1989) 5:291) a 
transcriptional activator which mediates nitrogen metabolite repression; 3) Neurospora crassa 
nit-2 (Fu, Y.-H., et aL, Mol Cell Biol (1990) 70:1056), a transcriptional activator which turns 
on the expression of genes coding for enzymes required for the use of a variety of secondary 
nitrogen sources, during conditions of nitrogen limitation; 4) Neurospora crassa white collar 
proteins 1 and 2 (WC-1 and WC-2), which control expression of light-regulated genes; 5) 
Saccharomyces cerevisiae DAL81 (or UGA43), a negative nitrogen regulatory protein; 6) 
Saccharomyces cerevisiae GLN3, a positive nitrogen regulatory protein; 7) Saccharomyces 
cerevisiae GAT1; 8) Saccharomyces cerevisiae GZF3. 

The consensus pattern for the GATA family is: C-x-[DN]-C-x(4,5)-[ST]»x(2)-W-[HR]- 
[RK]-x(3)-[GN]-x(3,4)-C-N-[AS]-C, where the four Cs are zinc ligands. 

j) G-Protein Alpha Subunit SEQ ID NO: 3 67 corresponds to a gene encoding a novel 
polypeptide of the G-protein alpha subunit family. Guanine nucleotide binding proteins (G- 
proteins) are a family of membrane-associated proteins that couple extracellularly-activated 
integral-membrane receptors to intracellular effectors, such as ion channels and enzymes that 
vary the concentration of second messenger molecules. G-proteins are composed of 3 subunits 
(alpha, beta and gamma) which, in the resting state, associate as a trimer at the inner face of the 
plasma membrane. The alpha subunit has a molecule of guanosine diphosphate (GDP) bound to 
it. Stimulation of the G-protein by an activated receptor leads to its exchange for GTP 
(guanosine triphosphate). This results in the separation of the alpha from the beta and gamma 
subunits, which always remain tightly associated as a dimer. Both the alpha and beta-gamma 
subunits are then able to interact with effectors, either individually or in a cooperative manner. 
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The intrinsic GTPase activity of the alpha subunit hydro lyses the bound GTP to GDP. This 
returns the alpha subunit to its inactive conformation and allows it to reassociate with the beta- 
gamma subunit, thus restoring the system to its resting state. 

G-protein alpha subunits are 350-400 amino acids in length and have molecular weights 
5 in the range 40-45 kDa. Seventeen distinct types of alpha subunit have been identified in 

mammals. These fall into 4 main groups on the basis of both sequence similarity and function: 
alpha-s, alpha-q, alpha-i and alpha-12 (Simon et aL, Science (1993) 252:802). Many alpha 
subunits are substrates for ADP-ribosylation by cholera or pertussis toxins. They are often N- 
terminally acylated, usually with myristate and/or palmitoylate, and these fatty acid 

1 0 modifications are probably important for membrane association and high- affinity interactions 
jS with other proteins. The atomic structure of the alpha subunit of the G-protein involved in 
jl mammalian vision, transducin, has been elucidated in both GTP- and GDB-bound forms, and 

f S'g 
* « t» 

HI shows considerable similarity in both primary and tertiary structure in the nucleotide-binding 
f!f regions to other guanine nucleotide binding proteins, such as p2 1 -ras and EF-Tu. 

p. 1 5 k) Phorbol Esters/Diacvlglvcerol Binding. SEQ ID NO: 1 88 and 25 1 represent 

PI ' ~ 

*v .V 

flf polynucleotides encoding a protein belonging to the family including phorbol 

f^- esters/diacylglycerol binding proteins. Diacylglycerol (DAG) is an important second 

Cl messenger. Phorbol esters (PE) are analogues of DAG and potent tumor promoters that cause a 

t M 

variety of physiological changes when administered to both cells and tissues. DAG activates a 
20 family of serine/threonine protein kinases, collectively known as protein kinase C (PKC) (Azzi 
et aL, Eur. J. Biochem. (1992) 205:547). Phorbol esters can directly stimulate PKC. The N- 
terminal region of PKC, known as Cl , has been shown (Ono et aL , Proc. NatL Acad Set USA 
(1 989) 86:486$) to bind PE and DAG in a phospholipid and zinc-dependent fashion. The Cl 
region contains one or two copies (depending on the isozyme of PKC) of a cysteine-rich 
25 domain about 50 amino-acid residues long and essential for DAG/PE-binding. Such a domain 
has also been found in, for example, the following proteins. 

(1) Diacylglycerol kinase (EC 2.7.1.107) (DGK) (Sakane etaL, Nature (1990) 3^:345), 
the enzyme that converts DAG into phosphatidate. It contains two copies of the DAG/PE- 
binding domain in its N-terminal section. At least five different forms of DGK are known in 
30 mammals; and 
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(2) N-chimaerin, a brain specific protein which shows sequence similarities with the 
BCR protein at its C-terminal part and contains a single copy of the DAG/PE-binding domain at 
its N-terminal part. It has been shown (Ahmed et al. , Biochem. J. (1 990) 2 72:161, and Ahmed 
et al, Biochem. J. (1991) 250:233) to be able to bind phorbol esters. 
5 The DAG/PE-binding domain binds two zinc ions; the ligands of these metal ions are 

probably the six cysteines and two histidines that are conserved in this domain. The signature 
pattern completely spans the DAG/PE domain. The consensus pattern is: H-x-[LIVMFYW]- 
x(8,l l)-C-x(2)-C-x(3)-[LIVMFC]-x(5,10)-C-x(2)-C-x(4)-[HD]-x(2)-C-x(5,9)-C. All the C and 

H are probably involved in binding zinc. 

10 l) Protein Kinase. SEQ ID NOS:202, 315, 367, and 397 represent polynucleotides 

encoding protein kinases. Protein kinases catalyze phosphorylation of proteins in a variety of 
pathways, and are implicated in cancer. Eukaryotic protein kinases (Hanks S.K., et al, FASEB 
J. (1995) 9:576; Hunter T., Meth. Enzymol (1991) 200:3; Hanks S.K., et al, Meth. Enzymol. 
(1991) 200:38; Hanks S.K., Curr. Opin. Struct. Biol (1991) 7:369; Hanks S.K., et al, Science 

1 5 (1988) 24.7:42) are enzymes that belong to a very extensive family of proteins which share a 
conserved catalytic core common to both serine/threonine and tyrosine protein kinases. There 
are a number of conserved regions in the catalytic domain of protein kinases. Two of the 
conserved regions are the basis for the signature pattern in the protein kinase profile. The first 
region, which is located in the N-terminal extremity of the catalytic domain, is a glycine-rich 

20 stretch of residues in the vicinity of a lysine residue, which has been shown to be involved in 
ATP binding. The second region, which is located in the central part of the catalytic domain, 
contains a conserved aspartic acid residue which is important for the catalytic activity of the 
enzyme (Knighton D.R., et al, Science (1991) 255:407). The protein kinase profile includes 
two signature patterns for this second region: one specific for serine/threonine kinases and the 

25 other for tyrosine kinases. A third profile is based on the alignment in (Hanks S.K., et al. , 

FASEB J. (1995) 9:576) and covers the entire catalytic domain. The consensus patterns are as 
follows: 

1) Consensus pattern: [LIV]-G-{P}-G-{P}-[FYWMGSTNH]-[SGA]-{PW}-[LIVCAT]- 
{PD}-x-[GSTACLIVMFY]-x(5,18)-[LIVMFYWCSTAR]-[AIVP]-[LIVMFAGCKR]-K, where 
30 K binds ATP. The majority of known protein kinases are detected by this pattern. Proteins 
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kinases that are not detected by this consensus include viral kinases, which are quite divergent 
in this region and are completely missed by this pattern. 

2) Consensus pattern: [LIVMFYC]"X-[HY]-x-D-[LIVMFY]-K-x(2)-N- 
[LIVMFYCT](3), where D is an active site residue. This consensus sequence identifies most 

5 serine/threonine-specific protein kinases with only 10 exceptions. Half of the exceptions are 
viral kinases, while the other exceptions include Epstein-Barr virus BGLF4 and Drosophila 
ninaC, which have Ser and Arg, respectively, instead of the conserved Lys. These latter two 
protein kinases are detected by the tyrosine kinase specific pattern described below. 

3) Consensus pattern: [LIVMFYC]-x-[HY]-x-D-[LIVMFY]-[RSTAC]-x(2)-N- 

H 10 [LIVMFYC], where D is an active site residue. All tyrosine-specific protein kinases are 

fcl 

Si detected by this consensus pattern, with the exception of human ERBB3 and mouse blk. This 

y 1; 

^ pattern also detects most bacterial aminoglycoside phosphotransferases (Benner S., Nature 

yri 

iff (1 987) 329:2 1 ; Kirby R., J. Mol. Evol. (1 992) 30:489) and herpesviruses ganciclovir kinases 

* £ *t 

Jii (Littler E., et ah , Nature (1 992) 358: 1 60), which are structurally and evolutionary related to 

* 15 protein kinases. 

ft J The protein kinase profile also detects receptor guanylate cyclases and 2-5A-dependent 

T n ribonucleases. Sequence similarities between these two families and the eukaryotic protein 

CI kinase family have been noticed previously. The profile also detects Arabidopsis thaliana 

fll 

kinase-like protein TMKL1 which seems to have lost its catalytic activity. 
20 If a protein analyzed includes the two of the above protein kinase signatures, the 

probability of it being a protein kinase is close to 100%. Eukary otic-type protein kinases have 
also been found in prokaryotes such as Myxococcus xanthus (Munoz-Dorado J., et ah, Cell 
(1991) 7:995) and Yersinia pseudotuberculosis. The patterns shown above has been updated 
since their publication in (Bairoch A., et ah , Nature ( 1 988) 331 :22). 
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m) Protein Phosphatase 2C, SEQ ID NO:256 corresponds to a polynucleotide encoding 
a novel protein phosphatase 2C (PP2C), which is one of the four major classes of mammalian 
serine/threonine specific protein phosphatases. PP2C (Wenk et al, FEES Lett (1992) 297:135) 
is a monomeric enzyme of about 42 Kd which shows broad substrate specificity and is 
dependent on divalent cations (mainly manganese and magnesium) for its activity. Three 
isozymes are currently known in mammals: PP2C-alpha, -beta and -gamma. 

n) Protein Tyrosine Phosphatase. SEQ ID NO: 3 82 represents a polynucleotide encoding 
a protein tyrosine kinase. Tyrosine specific protein phosphatases (EC 3.1.3.48) (PTPase) 
(Fischer et al , Science (1991) 253:401; Charbonneau et al , Annu. Rev. Cell Biol (1992) 5:463; 
Trowbridge, J. Biol Chem. (1991) 266:23517; Tonks etal, Trends Biochem. Set (1989) 
14:497; and Hunter, Cell (1989) 55:1013) catalyze the removal of a phosphate group attached to 
a tyrosine residue. These enzymes are very important in the control of cell growth, 
proliferation, differentiation and transformation. Multiple forms of PTPase have been 
characterized and can be classified into two categories: soluble PTPases and transmembrane 
receptor proteins that contain PTPase domain(s). 

Soluble PTPases include PTPN3 (HI) and PTPN4 (MEG), enzymes that contain an N- 
terminal band 4. 1 -like domain and could act at junctions between the membrane and 
cytoskeleton; PTPN6 (PTP-1C; HCP; SHP) and PTPN1 1 (PTP-2C; SH-PTP3; Syp), enzymes 
that contain two copies of the SH2 domain at its N-terminal extremity. 

Dual specificity PTPases include DUSP1 (PTPN10; MAP kinase phosphatase- 1; MKP- 
1) which dephosphorylates MAP kinase on both Thr-183 and Tyr-185; and DUSP2 (PAC-1), a 
nuclear enzyme that dephosphorylates MAP kinases ERK1 and ERK2 on both Thr and Tyr 
residues. 

Structurally, all known receptor PTPases are made up of a variable length extracellular 
domain, followed by a transmembrane region and a C-terminal catalytic cytoplasmic domain. 
Some of the receptor PTPases contain fibronectin type III (FN-III) repeats, immunoglobulin-like 
domains, MAM domains or carbonic anhydrase-like domains in their extracellular region. The 
cytoplasmic region generally contains two copies of the PTPAse domain. The first seems to 
have enzymatic activity, while the second is inactive but seems to affect substrate specificity of 
the first. In these domains, the catalytic cysteine is generally conserved but some other, 
presumably important, residues are not. 
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PTPase domains consist of about 300 amino acids. There are two conserved cysteines 
and the second one has been shown to be absolutely required for activity. Furthermore, a 
number of conserved residues in its immediate vicinity have also been shown to be important. 
The consensus pattern for PTPases is: [LIVMF]-H-C-x(2)-G-x(3)-[STC]-[STAGP]-x- 
[LIVMFY]; C is the active site residue. 

o) SH3 Domain. SEQ ID NO:306 and 386 represent polynucleotides encoding SH3 
domain proteins. The Src homology 3 (SH3) domain is a small protein domain of about 60 
amino acid residues first identified as a conserved sequence in the non-catalytic part of several 
cytoplasmic protein tyrosine kinases (e.g. Src, Abl, Lck) (Mayer et al, Nature (1988) 332:272). 
The domain has also been found in a variety of intracellular or membrane-associated proteins 
(Musacchio et al, FEBS Lett, (1992) 507:55; Pawson et al, Curr. Biol (1993) 5:434; Mayer et 
al, Trends Cell Biol (1993) 5:8; and Pawson etal, Nature (1995) 575:573). 

The SH3 domain has a characteristic fold that consists of five or six beta-strands 
arranged as two tightly packed anti-parallel beta sheets. The linker regions may contain short 
helices (Kuriyan et al, Curr. Opin. Struct Biol (1993) 5:828). It is believed that SH3 domain- 
containing proteins mediate assembly of specific protein complexes via binding to proline-rich 
peptides (Morton et al, Curr. Biol (1994) 4:615). In general, SH3 domains are found as single 
copies in a given protein, but there is a significant number of proteins with two SH3 domains 
and a few with 3 or 4 copies. 

SH3 domains have been identified in, for example, protein tyrosine kinases, such as the 
Src, Abl, Bkt, Csk and ZAP70 families of kinases; mammalian phosphatidylinositol-specific 
phospholipase C-gamma-1 and -2; mammalian phosphatidyl inositol 3 -kinase regulatory p85 
subunit; mammalian Ras GTPase-activating protein (GAP); mammalian Vav oncoprotein, a 
guanine nucleotide exchange factor of the CDC24 family; Drosophila lethal(l)discs large- 1 
tumor suppressor protein (gene Dlgl); mammalian tight junction protein ZO-1; vertebrate 
erythrocyte membrane protein p55; Caenorhabditis elegans protein lin-2; rat protein CASK; and 
mammalian synaptic proteins SAP90/PSD-95, CHAPSYN-1 10/PSD-93, SAP97/DLG1 and 
SAP102. Novel SH3-domain containing polypeptides will facilitate elucidation of the role of 
such proteins in important biological pathways, such as ras activation. 

p) Trypsin. SEQ ID NO: 1 69 corresponds to a novel serine protease of the trypsin 
family. The catalytic activity of the serine proteases from the trypsin family is provided by a 
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charge relay system involving an aspartic acid residue hydrogen-bonded to a histidine, which 
itself is hydrogen-bonded to a serine. The sequences in the vicinity of the active site serine and 
histidine residues are well conserved in this family of proteases (Brenner S., Nature (1988) 
334:528), Proteases known to belong to the trypsin family include: 1) Acrosin; 2) Blood 
coagulation factors VII, IX, X, XI and XII, thrombin, plasminogen, and protein C; 3) Cathepsin 
G; 4) Chymotrypsins; 5) Complement components Clr, Cls, C2, and complement factors B, D 
and I; 6) Complement-activating component of RA-reactive factor; 7) Cytotoxic cell proteases 
(granzymes A to H); 8) Duodenase I; 9) Elastases 1, 2, 3 A, 3B (protease E), leukocyte 

(medullasin).; 10) Enterokinase (EC 3.4.2 1.9) (enteropeptidase); 11) Hepatocyte growth factor 
activator; 12) Hepsin; 13) Glandular (tissue) kallikreins (including EGF-binding protein types 
A, B, and C, NGF-gamma chain, gamma-renin, prostate specific antigen (PSA) and tonin); 14) 
Plasma kallikrein; 15) Mast cell proteases (MCP) 1 (chymase) to 8; 16) Myeloblastin 
(proteinase 3) (Wegener's autoantigen); 17) Plasminogen activators (urokinase-type, and tissue- 
type); 18) Trypsins I, II, III, and IV; 19) Tryptases; 20) Snake venom proteases such as ancrod, 
batroxobin, cerastobin, flavoxobin, and protein C activator; 21) Collagenase from common 
cattle grub and collagenolytic protease from Atlantic sand fiddler crab; 22) Apolipoprotein(a); 
23) Blood fluke cercarial protease; 24) Drosophila trypsin like proteases: alpha, easter, snake- 
locus; 25) Drosophila protease stubble (gene sb); and 26) Major mite fecal allergen Der p III. 
All the above proteins belong to family SI in the classification of peptidases (Rawlings N.D., et 
al, Metk Enzymol (1994) 244:19; http;//www.expasv.ch/cgi-bin/lists?peptidas.txt ) and 
originate from eukaryotic species. It should be noted that bacterial proteases that belong to 
family S2A are similar enough in the regions of the active site residues that they can be picked 
up by the same patterns. 

The consensus patterns for this trypsin protein family are: 1) [LIVM]-[ST]-A-[STAG]- 
H-C, where H is the active site residue. All sequences known to belong to this class detected by 
the pattern, except for complement components Clr and Cls, pig plasminogen, bovine protein 
C, rodent urokinase, ancrod, gyroxin and two insect trypsins; 2) [DNSTAGC]- 
[GSTAPIMVQH]-x(2>G-[DE]-S-G-[GS]-[SAPHV]- [LIVMFYWH]-[LIVMFYSTANQH], 
where S is the active site residue. All sequences known to belong to this family are detected by 
the above consensus sequences, except for 1 8 different proteases which have lost the first 
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conserved glycine. If a protein includes both the serine and the histidine active site signatures, 
the probability of it being a trypsin family serine protease is 100%. 

q) WD Domain, G-Beta Repeats. SEQ ID NOS:188 and 335 represent novel members 
of the WD domain/G-beta repeat family. Beta-transducin (G-beta) is one of the three subunits 
5 (alpha, beta, and gamma) of the guanine nucleotide-binding proteins (G proteins) which act as 
intermediaries in the transduction of signals generated by transmembrane receptors (Gilman, 
Amu. Rev. Biochem. (1987) 56:615). The alpha subunit binds to and hydrolyzes GTP; the 
functions of the beta and gamma subunits are less clear but they seem to be required for the 
replacement of GDP by GTP as well as for membrane anchoring and receptor recognition. 
H ! 10 In higher eukaryotes, G-beta exists as a small multigene family of highly conserved 

proteins of about 340 amino acid residues. Structurally, G-beta consists of eight tandem repeats 
of about 40 residues, each containing a central Trp- Asp motif (this type of repeat is sometimes 
called a WD-40 repeat). Such a repetitive segment has been shown to exist in a number of other 
proteins including: human LIS 1 , a neuronal protein involved in type-1 lissencephaly; and 
15 mammalian coatomer beta' subunit (beta'-COP), a component of a cytosolic protein complex 
that reversibly associates with Golgi membranes to form vesicles that mediate biosynthetic 
protein transport. 

The consensus pattern for the WD domain/G-Beta repeat family is: [LIVMSTAC]- 
[LIVMFYWSTAGC]-[LIMSTAG]-[LIVMSTAGC]-x(2)-[DN]-x(2)-[LIVMWSTAC]-x- 
20 [LIVMFSTAG]-W-[DEN]-[LIVMFSTAGCN]. 

r) wnt Family of Developmental Signaling Proteins. SEQ ID NO: 23, 291, 324, 330, 
341, and 353 correspond to novel members of the wnt family of developmental signaling 
proteins. Wnt-1 (previously known as int-1), the seminal member of this family, (Nusse R., 
Trends Genet (1988) 4:291) is a proto-oncogene induced by the integration of the mouse 
25 mammary tumor virus. It is thought to play a role in intercellular communication and seems to 
be a signalling molecule important in the development of the central nervous system (CNS). 
The sequence of wnt-1 is highly conserved in mammals, fish, and amphibians. Wnt-1 was 
found to be a member of a large family of related proteins (Nusse R., et al 9 Cell (1992) 
5P:1073; McMahon A.P., Trends Genet (1992) 8:1; Moon R.T., BioEssays (1993) 75:91) that 
30 are all thought to be developmental regulators. These proteins are known as wnt-2 (also known 
as irp), wnt-3, -3A, -4, -5A, -5B, -6, -7 A, -7B, -8, -8B, -9 and -10. At least four members of this 
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family are present in Drosophila; one of them, wingless (wg), is implicated in segmentation 
polarity. All these proteins share the following features characteristics of secretory 
proteins: a signal peptide, several potential N-glycosylation sites and 22 conserved cysteines 
that are probably involved in disulfide bonds. The Wnt proteins seem to adhere to the plasma 
5 membrane of the secreting cells and are therefore likely to signal over only few cell diameters. 
The consensus pattern, which is based upon a highly conserved region including three cysteines, 
is as follows: C-K-C-H-G-[LIVMT]-S-G-x-C. All sequences known to belong to this family are 
detected by the provided consensus pattern. 

s) Ww/rso5/WWP Domain-Containing Proteins. SEQ ID NOS:188, 379 , and 395 
H 1 0 represent polynucleotides encoding a polypeptide in the family of WW/rsp5/WWP domain- 
.11 containing proteins. The WW domain (Bork et al, Trends Biochem. Sci. (1994) 79:531; Andre 

f:«1 M 

et al. , Biochem. Biophys. Res. Commun. ( 1 994) 205: 1 20 1 ; Hofmann et al , FEBS Lett. (1995) 
B 358: 1 53; and Sudol et al. , FEBS Lett. (1 995) 369:67), also known as rsp5 or WWP), was 
m originally discovered as a short conserved region in a number of unrelated proteins, among them 

■* 1 5 dystrophin, the gene responsible for Duchenne muscular dystrophy. The domain, which spans 
fif about 35 residues, is repeated up to 4 times in some proteins. It has been shown (Chen et al, 

r 

K Proc. Natl. Acad. Sci. USA (1995) 92:7819) to bind proteins with particular proline-motifs, 

i § i': 

-KJ' S 

0 [AP]-P-P-[AP]-Y, and thus resembles somewhat SH3 domains. It appears to contain beta- 

'* " ' strands grouped around four conserved aromatic positions, generally Trp. The name WW or 

20 WWP derives from the presence of these Trp as well as that of a conserved Pro. It is frequently 
associated with other domains typical for proteins in signal transduction processes. 
Proteins containing the WW domain include: 

1 . Dystrophin, a multidomain cytoskeletal protein. Its longest alternatively spliced 
form consists of an N-terminal actin-binding domain, followed by 24 spectrin-like repeats, a 

25 cysteine-rich calcium-binding domain and a C-terminal globular domain. Dystrophins form 
tetramers and is thought to have multiple functions including involvement in membrane 
stability, transduction of contractile forces to the extracellular environment and organization of 
membrane specialization. Mutations in the dystrophin gene lead to muscular dystrophy of 
Duchenne or Becker type. Dystrophin contains one WW domain C-terminal of the spectrin- 

30 repeats. 
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2. Vertebrate YAP protein, which is a substrate of an unknown serine kinase. It 
binds to the SH3 domain of the Yes oncoprotein via a proline-rich region. This protein appears 
in alternatively spliced isoforms, containing either one or two WW domains. 

3. IQGAP, which is a human GTPase activating protein acting on ras. It contains 

5 an N-terrninal domain similar to fly muscle mp20 protein and a C-terminal ras GTPase activator 
domain. 

For the sensitive detection of WW domains, the profile spans the whole homology 
region as well as a pattern. The consensus for this family is: W-x(9,l l)-[VFY]-[FYW]-x(6,7)- 
[GSTNE]-[GSTQCR]-[FYW]-x(2)-P. 
i 4 10 t) Zinc Finger, C2H2 Type. SEQ ID NO:61, 306, and 386 correspond to polynucleotides 

W encoding novel members of the of the C2H2 type zinc finger protein family. Zinc finger 
Sj domains (Klug et al, Trends Biochem. Sci. (1987) 72:464; Evans et ai, Cell (1988) 52:1 ; Payre 
m etal, FEBSLett. (1988) 234:245; Miller etal, EMBOJ. (1985) 4:1609; and Berg, Proc. Natl 

j -i: 

$1 Acad Sci USA (1988) #5:99) are nucleic acid-binding protein structures first identified in the 

If! 

e 1 5 Xenopus transcription factor TFIII A. These domains have since been found in numerous 

£ ""J 

«p nucleic acid-binding proteins. A zinc finger domain is composed of 25 to 30 amino acid 

f** residues. Two cysteine or histidine residues are positioned at both extremities of the domain, 

j ^ I** 

* - K 

* 35 ** 

P which are involved in the tetrahedral coordination of a zinc atom. It has been proposed that 

fin 

■ such a domain interacts with about five nucleotides. 

20 Many classes of zinc fingers are characterized according to the number and positions of 

the histidine and cysteine residues involved in the zinc atom coordination. In the first class to 
be characterized, called C2H2, the first pair of zinc coordinating residues are cysteines, while 
the second pair are histidines. A number of experimental reports have demonstrated the zinc- 
dependent DNA or RNA binding property of some members of this class. 

25 Mammalian proteins having a C2H2 zipper include (number in parenthesis indicates 

number of zinc finger regions in the protein): basonuclin (6), BCL-6/LAZ-3 (6), erythroid 
krueppel-like transcription factor (3), transcription factors Spl (3), Sp2 (3), Sp3 (3) and Sp(4) 3, 
transcriptional repressor YY1 (4), Wilms' tumor protein (4), EGRl/Krox24 (3), EGR2/Krox20 
(3), EGR3/Pilot (3), EGR4/AT133 (4), Evi-1 (10), GLI1 (5), GLI2 (4+), GLI3 (3+), HIV- 

30 EP1/ZNF40 (4), HIV-EP2 (2), KR1 (9+), KR2 (9), KR3 (15+), KR4 (14+), KR5 (11+), HF.12 
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(6+), REX-1 (4), ZfX (13), ZfY (13), Zfp-35 (18), ZNF7 (15), ZNF8 (7), ZNF35 (10), 
ZNF42/MZF-1 (13), ZNF43 (22), ZNF46/Kup (2), ZNF76 (7), ZNF91 (36), ZNF133 (3). 

In addition to the conserved zinc ligand residues, it has been shown that a number of 
other positions are also important for the structural integrity of the C2H2 zinc fingers. 
(Rosenfeld et aL 9 J. Biomol Struct Dyn. (1993) 77:557) The best conserved position is found 
four residues after the second cysteine; it is generally an aromatic or aliphatic residue. The 
consensus pattern for C2H2 zinc fingers is: C-x(2 ? 4)-C-x(3)-[LIVMFYWC]-x(8)-H-x(3,5)-H. 
The two C's and two H r s are zinc ligands. 

u) Zinc Finger, CCHC Class. SEQ ID NO:322 corresponds to a polynucleotide 
encoding a novel member of the zinc finger CCHC family. The CCHC zinc finger protein 
family to date has been mostly composed of retroviral gag proteins (nucleocapsid). The 
prototype structure of this family is from HIV. The family also contains members involved in 
eukaryotic gene regulation, such as C. elegans GLH-1. The consensus sequence of this family is 
based upon the common structure of an 18 -residue zinc finger. 

v) Zinc-Binding Metalloprotease Domain. SEQ ID NO:306 and 395 represent 
polynucleotides encoding novel members of the zinc-binding metalloprotease domain protein 
family. The majority of zinc-dependent metallopeptidases (with the notable exception of the 
carboxypeptidases) share a common pattern of primary structure (Jongeneel etal, FEBSLett 
(1989) 242:21 1; Murphy et aL, FEES Lett (1991) 289:4; and Bode et aL, Zoology (1996) 
PP:237) in the part of their sequence involved in the binding of zinc, and can be grouped 
together as a superfamily, known as the metzincins, on the basis of this sequence similarity. 
Examples of these proteins include: 1) Angiotensin-converting enzyme (EC 3.4.15.1) 
(dipeptidyl carboxypeptidase I) (ACE), the enzyme responsible for hydrolyzing angiotensin I to 
angiotensin II. 2) Mammalian extracellular matrix metalloproteinases (known as matrixins) 
(Woessner, FASEBJ. (1991) 5:2145): MMP-1 (EC 3.4.24.7) (interstitial collagenase), MMP-2 
(EC 3.4.24.24) (72 Kd gelatinase), MMP-9 (EC 3.4.24.35) (92 Kd gelatinase), MMP-7 (EC 
3.4.24.23) (matrylisin), MMP-8 (EC 3.4.24.34) (neutrophil collagenase), MMP-3 (EC 
3.4.24.17) (stromelysin-1), MMP-1 0 (EC 3.4.24.22) (stromelysin-2), and MMP-1 1 
(stromelysin-3), MMP-12 (EC 3.4.24.65) (macrophage metalloelastase). 3) Endothelin- 
converting enzyme 1 (EC 3.4.24.71) (ECE-i), which processes the precursor of endothelin to 
release the active peptide. 

109 



A signature pattern which includes the two histidine and the glutamic acid residues is 
sufficient to detect this superfamily of proteins, having the consensus pattern: [GSTALIVN]- 
x(2)-H-E-[LIVMFYW]-{DEHRKP}-H-x-[LIVMFYWGSPQ]. The two H's are zinc ligands, 
and E is the active site residue. 

Example 4: Differential Expression of Polynucleotides of the Invention : Description of 

Libraries and Detection of Differential Expression 
The relative expression levels of the polynucleotides of the invention was assessed in 
several libraries prepared from various sources, including cell lines and patient tissue samples. 
Table 4 provides a summary of these libraries, including the shortened library name (used 
hereafter), the mRNA source used to prepared the cDNA library, the "nickname" of the library 
that is used in the tables below (in quotes), and the approximate number of clones in the library. 
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Table 4 Description of cDNA Libraries 



Library 
(lib #) 


Description 


Number of 
Clones in this 
C!histerin$? 


1 


Kml2L4 

Human Colon Cell Line, High Metastatic Potential 
(derived from Kml2C) 
"High Colon" 

1 ll£^ll 


307133 


2 


Kml2C 

Human Colon Cell Line, Low Metastatic Potential 
"Low Colon" 


284755 


3 


MDA-MB-231 

Human Breast Cancer Cell Line, High Metastatic 
Potential; micro-metastases in lung 

"Hi ah Breast" 

x 11 till JL/lwclol 


326937 


4 


MCF7 

Human Breast Cancer Cell, Non Metastatic 

LjUW JDICctbl 


318979 


8 


MV-522 

Human Lung Cancer Cell Line, High Metastatic 

r UlCllllcU 

"High Lung" 


223620 


9 


UCP-3 

llUIllaH i_yUllg V/ailCCI V^CIl I^IIIC, JLAJYV IVIC Lab tall L r UlCil LI ai 

"Low Lung" 




12 


Human microvascular endothelial cells (HMEC) - 

u nil ccilcu. 

PCR (OligodT) cDNA library 




13 


Human microvascular endothelial cells (HMEC) - bFGF 
PCR (OligodT) cDNA library 


47100 

tZ. 1 \J\) 


14 


Human microvascular endothelial cells (HMEC) - VEGF 

LI 

PCR (OligodT) cDNA library 




15 


Normal Colon - UC#2 Patient 
PCR (OligodT) cDNA library 

x ^m*s x x. i x^ xx^» vs xx -i- f x** -i— ^ a. ^ x x. xx^yx VfcX j 

"Normal Colon Tumor Tissue" 


34285 


16 


Colon Tumor - UC#2 Patient 
PCR (OligodT) cDNA library 
"Normal Colon Tumor Tissue" 


35625 


17 


Liver Metastasis from Colon Tumor of UC#2 Patient 
PCR (OligodT) cDNA library 
"High Colon Metastasis Tissue" 


36984 
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Library 
(hb #) 


Description 


Number of 
Clones in this 
Clustering 


18 


Normal Colon - UC#3 Patient 
PCR (OligodT) cDNA library 
"Normal Colon Tumor Tissue" 


36216 


19 


Colon Tumor - UC#3 Patient 
PCR (OligodT) cDNA library 
"High Colon Tumor Tissue" 


41388 


20 


Liver Metastasis from Colon Tumor of UC#3 Patient 
PCR (OligodT) cDNA library 
"High Colon Metastasis Tissue" 


30956 



The KM12L4 and KM12C cell lines are described in Example 1 above. The MDA-MB- 
231 cell line was originally isolated from pleural effusions (Cailleau, J. Natl Cancer, Inst. 
(1974) 53:661), is of high metastatic potential, and forms poorly differentiated adenocarcinoma 
grade II in nude mice consistent with breast carcinoma. The MCF7 cell line was derived from a 
pleural effusion of a breast adenocarcinoma and is non-metastatic. The MV-522 cell line is 
derived from a human lung carcinoma and is of high metastatic potential. The UCP-3 cell line 
is a low metastatic human lung carcinoma cell line; the MV-522 is a high metastatic variant of 
UCP-3. These cell lines are well-recognized in the art as models for the study of human breast 
and lung cancer (see, e.g., Chandrasekaran etal, Cancer Res. (1979) 39:870 (MDA-MB-23 1 
and MCF-7 ); Gastpar et alJMed Chem (1998) 41:4965 (MDA-MB-23 1 and MCF-7); Ranson 
et al, Br J Cancer (1998) 77:1586 (MDA-MB-231 and MCF-7); Kuang etal, Nucleic Acids 
Res (1998) 26:1 1 16 (MDA-MB-231 and MCF-7); Varki et al , Int J Cancer (1987) 40:46 (UCP- 
3); Varki et al, Tumour Biol (1990) 77:327; (MV-522 and UCP-3); Varki et al, Anticancer 
Res. (1990) 10:637; (MV-522); Kelner et al , Anticancer Res (1995) 75:867 (MV-522); and 
Zhang et al , Anticancer Drugs (1997) 5:696 (MV522)). The samples of libraries 15-20 are 
derived from two different patients (UC#2, and UC#3). 

Each of the libraries is composed of a collection of cDNA clones that in turn are 
representative of the mRNAs expressed in the indicated mRNA source. In order to facilitate the 
analysis of the millions of sequences in each library, the sequences were assigned to clusters. 

The concept of "cluster of clones" is derived from a sorting/grouping of cDNA clones based on 
their hybridization pattern to a panel of roughly 300 7bp oligonucleotide probes (see Drmanac et 
al, Genomics (1996) 37(1):29). Random cDNA clones from a tissue library are hybridized at 
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moderate stringency to 300 7bp oligonucleotides. Each oligonucleotide has some measure of 
specific hybridization to that specific clone. The combination of 300 of these measures of 
hybridization for 300 probes equals the "hybridization signature" for a specific clone. Clones 
with similar sequence will have similar hybridization signatures. By developing a 
sorting/grouping algorithm to analyze these signatures, groups of clones in a library can be 
identified and brought together computationally. These groups of clones are termed "clusters". 
Depending on the stringency of the selection in the algorithm (similar to the stringency of 
hybridization in a classic library cDNA screening protocol), the "purity" of each cluster can be 
controlled. For example, artifacts of clustering may occur in computational clustering just as 
artifacts can occur in "wet-lab" screening of a cDNA library with 400 bp cDNA fragments, at 
even the highest stringency. The stringency used in the implementation of cluster herein 
provides groups of clones that are in general from the same cDNA or closely related cDNAs. 
Closely related clones can be a result of different length clones of the same cDNA, closely 
related clones from highly related gene families, or splice variants of the same cDNA. 

Differential expression for a selected cluster was assessed by first determining the 

of 

number of cDNA clones corresponding to the selected cluster in the first library (Clones in 1 ), 
and the determining the number of cDNA clones corresponding to the selected cluster in the 
second library (Clones in 2 ). Differential expression of the selected cluster in the first library 
relative to the second library is expressed as a "ratio" of percent expression between the two 
libraries. In general, the "ratio" is calculated by: 1) calculating the percent expression of the 
selected cluster in the first library by dividing the number of clones corresponding to a selected 
cluster in the first library by the total number of clones analyzed from the first library; 
2) calculating the percent expression of the selected cluster in the second library by dividing the 
number of clones corresponding to a selected cluster in a second library by the total number of 
clones analyzed from the second library; 3) dividing the calculated percent expression from the 
first library by the calculated percent expression from the second library. If the "number of 
clones" corresponding to a selected cluster in a library is zero, the value is set at 1 to aid in 
calculation. The formula used in calculating the ratio takes into account the "depth" of each of 
the libraries being compared, ie., the total number of clones analyzed in each library. 

In general, a polynucleotide is said to be significantly differentially expressed between 
two samples when the ratio value is greater than at least about 2, preferably greater than at least 
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about 3, more preferably greater than at least about 5 , where the ratio value is calculated using 
the method described above. The significance of differential expression is determined using a 
z score test (Zar, Biostatistical Analysis , Prentice Hall, Inc., USA, "Differences between 
Proportions," pp 296-298 (1974). 

Tables 5 to 7 (inserted before the claims) show the number of clones in each of the 
above libraries that were analyzed for differential expression. Examples of differentially 
expressed polynucleotides of particular interest are described in more detail below. 

Example 5: Polynucleotides Differentially Expressed in High Metastatic Potential Breast 

Cancer Cells Versus Low Metastatic Breast Cancer Cells 

A number of polynucleotide sequences have been identified that are differentially 
expressed between cells derived from high metastatic potential breast cancer tissue and low 
metastatic breast cancer cells. Expression of these sequences in breast cancer can be valuable in 
determining diagnostic, prognostic and/or treatment information. For example, sequences that 
are highly expressed in the high metastatic potential cells can be indicative of increased 
expression of genes or regulatory sequences involved in the metastatic process. A patient 
sample displaying an increased level of one or more of these polynucleotides may thus warrant 
more aggressive treatment. In another example, sequences that display higher expression in the 
low metastatic potential cells can be associated with genes or regulatory sequences that inhibit 
metastasis, and thus the expression of these polynucleotides in a sample may warrant a more 
positive prognosis than the gross pathology would suggest. 

The differential expression of these polynucleotides can be used as a diagnostic marker, 
a prognostic marker, for risk assessment, patient treatment and the like. These polynucleotide 
sequences can also be used in combination with other known molecular and/or biochemical 
markers. 

The following table summarizes identified polynucleotides with differential expression 
between high metastatic potential breast cancer cells and low metastatic potential breast cancer 
cells. 
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Table 8. Differentially expressed polynucleotides: High metastatic potential breast cancer 



vs. low metastatic breast cancer cells 



SEQ 


Differential Expression 


Cluster 


Clones in 


Clones in 


Ratio 


ID NO. 




ID 


1 st Library 


2 nd 
Library 




9 

y 


Hieh Breast > Low Breast (Lib3 > Lib4) 


2623 


31 


4 

• 


7 561356 


42 


Hish Breast > Low Breast fLib3 > Lib4) 


307 


196 


75 


2 54972 1 


52 


Hidi Breast > Low Breast (Lib3 > Lib4") 

X XAi_.XX iJlvUul JLJV/ VV lyivUJl lUlf J iJlUTJ 


19 


1364 


525 


2 534854 


62 


Hiffh Breast > Low Breast (Lib3 > Lib4"l 


2623 


31 

~J X 


4 


7 561356 


65 


Hi eh Breast > Low Breast <Tib3 > Lib4 > l 

X JJ1 Vvlui ^ J_/v/ VV JL/ivUJv 1 JU1UJ ^ X_/A C I / 


5749 


o 

y 


0 


8 780930 


66 


Hioh Breast > Low Breast (Lib3 > Lib4^ 


6455 


6 


o 


5 853953 


68 


Hioh Breast > Low Breast (Lib3 > Lib41 


6455 


6 


o 


5 853953 
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Hi oh Breast > Low Breast (Lib3 > Lib4^ 


2030 


32 


4 


1 805271 


193 


Hi oh Breast > Low Breast (Tib! > T ib4^ 


3389 


13 
x ^ 


9 


6 341782 

\J r X / \JAm 


144 

X II 


Hi oh Breast > Low Breast (Lib3 > Lib4"i 


4623 


12 

X A* 


2 


5 853953 


172 

j. / 


Hmh Breast > Low Breast (Lib3 > Lib4 > l 


102 


278 


116 


2 338217 


I/O 


Hi oh Breast > Low Breast fLib3 > Lib4"i 


3681 


10 

X \J 


1 

X 


9 756589 

y * i <y\j *j \> y 


214 


Hi oh Breast > Low Breast (Lib3 > Lib4"l 

X. XXc—lX XJXWUOl ^ X-~t\J VV xyl VUOl I J_/l L/—' *^ XjIU r 1 


3900 

~J s \J\J 


8 


1 

X 


7 805271 


21-9 


Hi oh Breast > T ow Breast fLibl > T ih4^ 

J. XXgiX JJXWUOI ■ — X-rv/YV X-/X S^CtO L IX-/1L/.J JulU i / 


3389 


13 
x j 


9 


6 341782 




Hi oh Breast > T ow Breast fT ih3 > T ib4 > i 


1399 


19 

x y 


7 


2 648217 


958 

Z. J o 


Hioh Rreast > T ow Breast fT ih3 > T \hA\ 


4837 


10 


o 


9 756589 


117 


Hi oh Rreast > T ow Breast fT ihl > T ih4^ 


1 577 


25 




8 130490 

O. 1 JU17U 


379 


Hi oh Breast > T ow Breast fi ih3 > T ih£\ 




27 


9 


13 17139 

l J.l / 1J7 


A 


T nw Rrpa«jf > 14i"oh Rrea^t ih4 > T ihl^ 


3706 




4 


5 63791 5 


1Q 


T ow Rrpact > Rioh RrpaQt (J \hA > T \M\ 


401 ^ 




o 


6 1 4Q600 

U.1T7U7U 


74 


T ow Rrf^aQt > LTioh Rrpa^t i\ ih4 > T \\\X\ 


6968 


18 

X o 




6 1496Q0 

VJ. 1t7U7u 




T ow Rrpa<:t > PTiah Rrpact fT ih4 > T ihl^ 




Q 
o 


1 
1 


8 199586 




T ow Breast > Hioh Breast (T ih4 > T ihl^ 


131 83 

1 Jl O J 


7 


o 


7 174638 


1 ^7 


T ow Rrpa<?t > Nioh Rrpa<it (\ ih4 > T ihl^ 


S417 


Q 

y 


o 


Q 994515 


162 


T ow Breast > Hioh Breast TLih4 > Lib!"! 

X—i\J VV JJIVUOI *^ XXlc^lX XJXvUOl IXjIUT X—/X Ly^l J 


9685 


7 


o 


7 174638 


183 


Low Breast > High Breast (Lib4 > Lib3) 


7337 


16 


3 


5.466391 


202 


Low Breast > High Breast (Lib4 > Lib3) 


6124 


9 


1 


9.224535 


298 


Low Breast > High Breast (Lib4 > Lib3) 


1037 


22 


4 


5.637215 


338 


Low Breast > High Breast (Lib4 > Lib3) 


689 


36 


17 


2.170478 


384 


Low Breast > High Breast (Lib4 > Lib3) 


697 


72 


30 


2.459876 


386 


Low Breast > High Breast (Lib4 > Lib3) 


4568 


9 


0 


9.224535 


388 


Low Breast > High Breast (Lib4 > Lib3) 


5622 


13 


2 


6.662164 



Example 6: Polynucleotides Differentially Expressed in High Metastatic Potential Lung 

Cancer Cells Versus Low Metastatic Lung Cancer Cells 
A number of polynucleotide sequences have been identified that are differentially 
expressed between cells derived from high metastatic potential lung cancer tissue and low 
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metastatic lung cancer cells. Expression of these sequences in lung cancer tissue can be 
valuable in determining diagnostic, prognostic and/or treatment information. For example, 
sequences that are highly expressed in the high metastatic potential cells are associated can be 
indicative of increased expression of genes or regulatory sequences involved in the metastatic 
process. A patient sample displaying an increased level of one or more of these polynucleotides 
may thus warrant more aggressive treatment. In another example, sequences that display higher 
expression in the low metastatic potential cells can be associated with genes or regulatory 
sequences that inhibit metastasis, and thus the expression of these polynucleotides in a sample 
may warrant a more positive prognosis than the gross pathology would suggest. 

The differential expression of these polynucleotides can be used as a diagnostic marker, 
a prognostic marker, for risk assessment, patient treatment and the like. These polynucleotide 
sequences can also be used in combination with other known molecular and/or biochemical 
markers. 

The following table summarizes identified polynucleotides with differential expression 
between high metastatic potential lung cancer cells and low metastatic potential lung cancer 
cells: 



Table 9 


Differentially expressed polynucleotides: High metastatic potential lung cancer 




vs. low metastatic lung cancer cells 








SEQ 


Differential Expression 


Cluster 


Clones in 


Clones in 


Ratio 


ID NO. 




ID 


1 st Library 


2 nd 
Library 




400 


High Lung > Low Lung (Lib8 > Lib 9) 


14929 


23 


16 


2.008868 


9 


High Lung > Low Lung (Lib8 > Lib9) 


2623 


6 


1 


8.384840 


34 


High Lung > Low Lung (Lib8 > Lib9) 


5832 


5 


0 


6.987366 


42 


High Lung > Low Lung (Lib8 > Lib9) 


307 


79 


27 


4.088903 


62 


High Lung > Low Lung (Lib8 > Lib9) 


2623 


6 


1 


8.384840 


74 


High Lung > Low Lung (Lib8 > Lib9) 


6268 


5 


0 


6.987366 


106 


High Lung > Low Lung (Lib8 > Lib9) 


10717 


8 


0 


11.17978 


119 


High Lung > Low Lung (Lib8 > Lib9) 


8 


1355 


122 


15.52111 


361 


High Lung > Low Lung (Lib8 > Lib9) 


1120 


5 


0 


6.987366 


369 


High Lung > Low Lung (Lib8 > Lib9) 


2790 


6 


0 


8.384840 


371 


High Lung > Low Lung (Lib8 > Lib9) 


8847 


6 


1 


8.384840 


379 


High Lung > Low Lung (Lib8 > Lib9) 


260 


15 


0 


20.96210 


395 


High Lung > Low Lung (Lib8 > Lib9) 


13538 


9 


1 


12.57726 


135 


Low Lung > High Lung (Lib9 > Lib8) 


36313 


30 


1 


21.46731 


154 


Low Lung > High Lung (Lib9 > Lib8) 


5345 


27 


6 


3.220097 


160 


Low Lung > High Lung (Lib9 > Lib8) 


4386 


21 


3 


5.009039 
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SEO 


Differential f^Ynrestsjinri 


■ llietpt* 


f lanpt in 
III 


v^iuneis in 


rv4lI0 


ID NO 




TD 




<*nd 
L 
















260 

jt« vy w 


Low Lunff > Hiah Lunp (7 ih9 > T \hR\ 


4141 


97 
z / 


A 




308 


Low Lung > High Lung (Lib9 > Lib8) 


15855 


213 


12 


12.70149 


323 


Low Lung > High Lung (Lib9 > Lib8) 


5257 


25 


5 


3.577885 


349 


Low Lung > High Lung (Lib9 > Lib8) 


2797 


14 


1 


10.01807 


381 


Low Lung > High Lung (Lib9 > Lib8) 


2428 


19 


2 


6.797982 



Example 7: Polynucleotides Differentially Expressed in High Metastatic Potential Colon 

Cancer Cells Versus Low Metastatic Colon Cancer Cells 

A number of polynucleotide sequences have been identified that are differentially 
expressed between cells derived from high metastatic potential colon cancer tissue and low 
metastatic colon cancer cells. Expression of these sequences in colon cancer tissue can be 
valuable in determining diagnostic, prognostic and/or treatment information. For example, 
sequences that are highly expressed in the high metastatic potential cells can be indicative of 
increased expression of genes or regulatory sequences involved in the metastatic process. A 
patient sample displaying an increased level of one or more of these polynucleotides may thus 
warrant more aggressive treatment. In another example, sequences that display higher 
expression in the low metastatic potential cells can be associated with genes or regulatory 
sequences that inhibit metastasis, and thus the expression of these polynucleotides in a sample 
may warrant a more positive prognosis than the gross pathology would suggest. 

The differential expression of these polynucleotides can be used as a diagnostic marker, 
a prognostic marker, for risk assessment, patient treatment and the like. These polynucleotide 
sequences can also be used in combination with other known molecular and/or biochemical 
markers. 

The following table summarizes identified polynucleotides with differential expression 
between high metastatic potential colon cancer cells and low metastatic potential colon cancer 
cells: 
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Table 10 Differentially expressed polynucleotides: High metastatic potential colon cancer 
















r^iffAVAii Eiffel TT Ynrpsfii fin 


Clll^tpr 


Clnnps in 


Clones in 

V/lvllviJ 111 

2 nd 

mm 

Library 


Ratio 


111 F\VJ. 




ID 


1 st Library 




1 
1 


WioH Pnlnn > T nw Colon (T ibl > T \b2) 


6660 


7 


o 


6.489973 


1 7£ 
I/O 






19 


6 


2 935940 




Pnlrm > T nw Colon (J ih1 > T ih?"\ 


4275 


11 

J. X 


2 


5 099264 


joz 








o 


7417112 




Miah Colon > T nw Colon <7 ih1 > T ih2^ 


6420 


8 


o 


7.417112 


39 


Low Colon > High Colon (Lib2 > Libl) 


4016 


14 


5 


3.020043 


97 


Low Colon > High Colon (Lib2 > Libl) 


945 


21 


9 


2.516702 


134 


Low Colon > High Colon (Lib2 > Libl) 


2464 


19 


5 


4.098630 


317 


Low Colon > High Colon (Lib2 > Libl) 


1577 


40 


12 


3.595289 


357 


Low Colon > High Colon (Lib2 > Libl) 


4309 


13 


4 


3.505407 



Example 8: Polynucleotides Differentially Expressed at Higher Levels in High Metastatic 

Potential Colon Cancer Patient Tissue Versus Normal Patient Tissue 

A number of polynucleotide sequences have been identified that are differentially 
expressed between cells derived from high metastatic potential colon cancer tissue and normal 
tissue. Expression of these sequences in colon cancer tissue can be valuable in determining 
diagnostic, prognostic and/or treatment information. For example, sequences that are highly 
expressed in the high metastatic potential cells are associated can be indicative of increased 
expression of genes or regulatory sequences involved in the advanced disease state which 
involves processes such as angiogenesis, dedifferentiation, cell replication, and metastasis. A 
patient sample displaying an increased level of one or more of these polynucleotides may thus 
warrant more aggressive treatment. 

The differential expression of these polynucleotides can be used as a diagnostic marker, 
a prognostic marker, for risk assessment, patient treatment and the like. These polynucleotide 
sequences can also be used in combination with other known molecular and/or biochemical 
markers. 

The following table summarizes identified polynucleotides with differential expression 
between high metastatic potential colon cancer cells and normal colon cells: 
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Table 11: Differentially expressed polynucleotides: High metastatic potential colon tissue 

vs. normal colon tissue 



cum 


uiiiereniiai iLxpression 


duster 


Clones in 


Clones in 


Katio 










Z 










Library 


Library 




52 


High Colon Metastasis Tissue > Normal 


19 


10 


0 


11.6991 




Colon Tissue of UC#3 (Lib20 > Libl8) 








8 


52 


High Colon Metastasis Tissue > Normal 


19 


13 


2 


6.02564 




Tissue in UC#2 (Libl7 > Libl5) 








6 


172 


High Colon Metastasis Tissue > Normal 


102 


65 


22 


2.73893 




Tissue in UC#2 (Libl7 > Libl5) 








0 



Example 9: Polynucleotides Differentially Expressed at Higher Levels in High Colon Tumor 

Potential Patient Tissue Versus Metastasized Colon Cancer Patient Tissue 

A number of polynucleotide sequences have been identified that are differentially 
expressed between cells derived from high tumor potential colon cancer tissue and cells derived 
from high metastatic potential colon cancer cells. Expression of these sequences in colon cancer 
tissue can be valuable in determining diagnostic, prognostic and/or treatment information 
associated with the transformation of precancerous tissue to malignant tissue. This information 
can be useful in the prevention of achieving the advanced malignant state in these tissues, and 
can be important in risk assessment for a patient. 

The following table summarizes identified polynucleotides with differential expression 
between high tumor potential colon cancer tissue and cells derived from high metastatic 
potential colon cancer cells: 



Table 12: Differentially expressed polynucleotides: High tumor potential colon tissue vs. 

metastatic colon tissue 



SEQ 


Differential Expression 


Cluster 


Clones in 


Clones in 


Ratio 


ID NO. 




ID 


1 st 


2 nd 










Library 


Library 




52 


High Colon Tumor Tissue > Metastasis 


19 


69 


10 


5.16082 




Tissue of UC#3 (Libl9 > Lib20) 








9 


119 


High Colon Tumor Tissue > Metastasis 


8 


14 


1 


10.4712 




Tissue of UC#3 (Lib 19 > Lib20) 








4 


172 


High Colon Tumor Tissue > Metastasis 


102 


43 


10 


3.21616 




Tissue of UC#3 (Lib 19 > Lib20) 








8 
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Example 10: Polynucleotides Differentially Expressed at Higher Levels in High Tumor 

Potential Colon Cancer Patient Tissue Versus Normal Patient Tissue 
A number of polynucleotide sequences have been identified that are differentially 
expressed between cells derived from high tumor potential colon cancer tissue and normal 
tissue. Expression of these sequences in colon cancer tissue can be valuable in determining 
diagnostic, prognostic and/or treatment information associated with the prevention of achieving 
the malignant state in these tissues, and can be important in risk assessment for a patient. For 
example, sequences that are highly expressed in the potential colon cancer cells are associated 
with or can be indicative of increased expression of genes or regulatory sequences involved in 
early tumor progression. A patient sample displaying an increased level of one or more of these 
polynucleotides may thus warrant closer attention or more frequent screening procedures to 
catch the malignant state as early as possible. 

The following table summarizes identified polynucleotides with differential expression 
between high metastatic potential colon cancer cells and normal colon cells: 



Table 13: Differentially expressed polynucleotides: High tumor potential colon tissue vs. 

normal colon tissue 



SEQ 


Differential Expression 


Cluster 


Clones in 


Clones in 


Ratio 


ID NO. 




ID 


1 st 
Library 


2 nd 
Library 




52 


High Colon Tumor Tissue > Normal 
Tissue of UC#2 (Libl6 > Libl5) 


19 


13 


2 


6.25550 
8 


288 


High Colon Tumor Tissue > Normal 
Tissue of UC#2 (Libl6 > Libl5) 


1267 


7 


0 


6.12525 
3 


52 


High Colon Tumor Tissue > Normal 
Tissue of UC#3 (Lib 19 > Lib 18) 


19 


69 


0 


60.3775 
0 


119 


High Colon Tumor Tissue > Normal 
Tissue of UC#3 (Libl9 > Libl8) 


8 


14 


1 


12.2505 
0 


172 


High Colon Tumor Tissue > Normal 


102 


43 


7 


5.37522 



Tissue of UC#3 (Libl9 > Libl8) 2 



Example 1 1 : Polynucleotides Differentially Expressed Across Multiple Libraries 

A number of polynucleotide sequences have been identified that are differentially 
expressed between cancerous cells and normal cells across all three tissue types tested (z.e. 5 
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breast, colon, and lung). Expression of these sequences in a tissue or any origin can be valuable 
in determining diagnostic, prognostic and/or treatment information associated with the 
prevention of achieving the malignant state in these tissues, and can be important in risk 
assessment for a patient. These polynucleotides can also serve as non-tissue specific markers 
of, for example, risk of metastasis of a tumor. The following table summarizes identified 
polynucleotides that were differentially expressed but without tissue type-specificity in the 
breast, colon, and lung libraries tested. 



Table 14: Polynucleotides Differentially Expressed Across Multiple Library Comparisons 


SEQ 


Differential Expression Cluster 


Clones in 


Clones in 


Ratio 


ID NO. 


ID 


1 st 


2 nd 








Library 


Library 




9 


High Breast > Low Breast (Lib3 > Lib4) 2623 


31 


4 


7.561356 




High Lung > Low Lung (Lib8 > Lib9) 2623 


6 


1 


8.384840 


39 


Low Breast > High Breast (Lib4 > Lib3) 401 6 


6 


0 


6.149690 




Low Colon > High Colon (Lib2 > Lib 1 ) 4016 


14 


5 


3.020043 


42 


High Breast > Low Breast (Lib3 > Lib4) 307 


196 


75 


2.549721 




High Lung > Low Lung (Lib8 > Lib9) 307 


79 


27 


4.088903 


52 


High Breast > Low Breast (Lib3 > Lib4) 19 


1364 


525 


2.534854 




High Colon Metastasis Tissue > Normal 1 9 


10 


0 


11.69918 




Colon Tissue of UC#3 (Lib20 > Libl 8) 










High Colon Metastasis Tissue > Normal 1 9 


13 


2 


6.025646 




Tissue in UC#2 (Lib 17 > Libl 5) 










High Colon Tumor Tissue > Metastasis 1 9 


69 


10 


5.160829 




Tissue of UC#3 (Lib 19 > Lib20) 










High Colon Tumor Tissue > Normal 1 9 


13 


2 


6.255508 




Tissue of UC#2 (Libl6 > Libl 5) 










High Colon Tumor Tissue > Normal 1 9 


69 


0 


60.37750 




Tissue of UC#3 (Libl 9 > Libl 8) 








62 


High Breast > Low Breast (Lib3 > Lib4) 2623 


31 


4 


7.561356 




High Lung > Low Lung (Lib8 > Lib9) 2623 


6 


1 


8.384840 


74 


High Lung > Low Lung (Lib8 > Lib9) 6268 


5 


0 


6.987366 




Low Breast > High Breast (Lib4 > Lib3) 6268 


18 


3 


6.149690 


119 


High Colon Tumor Tissue > Metastasis 8 


14 


1 


10.47124 




Tissue of UC#3 (Lib 19 > Lib20) 










High Colon Tumor Tissue > Normal 8 


14 


1 


12.25050 




Tissue of UC#3 (Libl9 > Libl 8) 










High Lung > Low Lung (Lib8 > Lib9) 8 


1355 


122 


15.52111 
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Differential Exoression Cluster 

l/ll E V^JIJI Hill A^^rKKs^A Vuul VU .» »• » 


Clones in 


Clones in 


Ratio 


m no 

JUL* llV/t 


ID 


l St 


2 nd 








Library 


Library 




1 11 


Hiph Breast > Low Breast (Lib3 > Lib4) 102 


278 


116 


2.338217 




Hi ah Pnlnn fVfptPKto^is Ti^^iir? > Normal 1 02 


65 


22 

-tart* -M 


2.738930 




Tissue in UC#2 (Lib 17 > Lib 15) 










Hioh Colon Tumor Tissue > Metastasis 102 


43 


10 


3.216168 




Tissue of UC#3 (Lib 19 > Lib20) 










Hieh Colon Tumor Tissue > Normal 1 02 


43 


7 


5.375222 




Tissue of UC#3 (Lib 19 > Lib 18) 








317 


High Breast > Low Breast (Lib.? > Lit>4) lj/ / 


ZD 


J 


0. 1 ju^tvu 




Low Colon > High Colon (Lib2 > Libl) 1577 


40 


12 


3.595289 


379 


High Breast > Low Breast (Lib3 > Lib4) 260 


27 


2 


13.17139 




High Lung > Low Lung (Lib8 > Lib9) 260 


15 


0 


20.96210 


ExampL 


s 12: Polynucleotides Exhibitine Colon-Specific Expression 







The cDNA libraries described herein were also analyzed to identify those 



polynucleotides that were specifically expressed in colon cells or tissue, i.e., the polynucleotides 
were identified in libraries prepared from colon cell lines or tissue, but not in libraries of breast 
or lung origin. The polynucleotides that were expressed in a colon cell line and/or in colon 
tissue, but were present in the breast or lung cDNA libraries described herein, are shown in 
Table 15. 



Table 15 Polynucleotides specifically expressed in colon cells. 



SEQ ID 


Cluster 


Clones in 


Clones in 


SEQ ID 


Cluster 


Clones in 


( 


NO. 




1 st 
Library 


2 nd 
Library 


NO. 




1 st Library 




5 


36535 


2 


0 


229 


39648 


2 


0 


13 


27250 


2 


0 


231 


85064 


1 


0 


19 


16283 


3 


0 


234 


39391 


2 


0 


24 


16918 


4 


0 


236 


39498 


2 


0 


26 


40108 


2 


0 


242 


22113 


3 


0 


32 


32663 


1 


1 


247 


19255 


2 


0 


43 


39833 


2 


0 


252 


22814 


3 


0 


47 


18957 


3 


0 


253 


39563 


2 


0 


48 


39508 


2 


0 


254 


39420 


2 


0 


56 


7005 


8 


2 


257 


39412 


2 


0 


58 


18957 


3 


0 


261 


38085 


2 


0 


59 


18957 


3 


0 


265 


40054 


1 


0 


60 


16283 


3 


0 


266 


39423 


2 


0 


64 


13238 


4 


1 


267 


39453 


2 


0 



,nd 



2' 

Library 
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SFO ID 


Cluster 

X^ 1 Ualv 1 


Clones in 


Clones in 


SEQ ID 


Cluster 


Clones in 


Clones in 
2 nd 

Library 


NO 




1 st 

JL 

Library 


2 nd 
Library 


NO. 




1 st Library 


70 


19442 


2 


0 


270 


78091 


1 


0 


71 


17036 


4 


0 


276 


39168 


2 


0 


71 


7005 


8 


2 


277 


39458 


2 


0 


81 


11476 


6 


0 


278 


14391 


3 


1 


86 
ou 


39425 


2 


0 


279 


39195 


2 


0 


94 


21847 


2 


1 


282 


12977 


5 


0 


100 

1 vv 


16731 


1 

»./ 


1 


284 


14391 


3 


1 


101 


17439 


4 

r 


o 


290 

am ^ x^ 


16347 


4 


0 


1 1 1 


17055 


4 


0 


293 


39478 


2 


0 


1 90 

1ZU 


67907 

U / 7U / 


i 


o 


294 


39392 


2 


0 




19081 


4 


o 


297 


39180 


2 


0 


1 OA 


1Q174 
jy i / *t 


9 


o 


299 


6867 


7 


3 


196 

1ZU 


8910 


9 
z* 


6 


301 


41633 


1 


1 


1 98 


4045S 


9 

■Cbf 


o 


302 

^y vy 


23218 


3 


0 




99195 


1 


o 


303 


39380 


2 


0 


1 Al 


86859 


1 


o 


309 


84328 


1 


0 


1 ^0 


8679 


4 


4 

• 


314 


14367 


3 


0 


1 JJ 


16077 

1V)7 / / 


4 


o 


320 


39886 


2 

** 


0 


1 DO 


17016 

1 /UjO 


4 


o 


324 


9061 


5 


2 


1 




9 


o 


327 


16653 


3 


1 


1 61 


40044 


9 


0 


328 


16985 

a vy vy 


4 


0 


10j 


991 55 


1 


o 


329 


12977 


5 


0 


1 66 

I DO 


1 ^066 


4 


o 


330 


9061 


5 


2 


1 70 
1 /U 


1 146S 




o 


333 
■j ~j ~j 


16392 


3 


0 


1 76 
1 /O 


176S 

J /UJ 


19 


6 


342 


39486 


2 


0 


1 £1 
151 


861 10 

OU 1 1U 


1 
i 


o 


344 


6874 


6 


3 




10648 

J7UtO 


9 


o 


345 


6874 

v-» vy # ■ 


6 


3 


1 5J 


1 7076 
i / u / u 


4 


o 


353 
~j *j *j 


11494 


4 


0 


1 86 


99794 


9 


o 


354 


17062 


3 


0 


1 87 


19171 


2 

z* 


o 


355 


16245 


4 


0 


1 Qd 

1/4 


404 S 5 


9 


o 


356 


83103 

V w vy ^y 


1 


0 


1 QQ 

1 77 


16117 

1 UJ I / 


1 


o 


358 

*J -mm/ V_/ 


13072 


4 


1 


710 

Zr 1 U 


19186 


2 
z* 


o 


366 


14364 


1 


0 


71 1 

Z.1 1 


40199 


9 


o 


368 

-J Vr *J 


84182 


1 


0 


218 


26295 


2 


0 


372 


56020 


1 


0 


222 


4665 


5 


9 


389 


7514 


5 


3 


226 


82498 


1 


0 


391 


7570 


5 


3 


227 


35702 


2 


0 


393 


23210 


3 


0 



In addition to the above, SEQ ID NOS: 1 59 and 1 6 1 were each present in one clone in 
each of Libl6 (Normal Colon Tumor Tissue), and SEQ ID NOS:344 and 345 were each present 
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in one clone in Libl7 (High Colon Metastasis Tissue). No clones corresponding to the colon- 
specific polynucleotides in the table above were present in any of Libraries 3, 4, 8, or 9. The 
polynucleotide provided above can be used as markers of cells of colon origin, and find 
particular use in reference arrays, as described above. 

Example 13: Identification of Contiguous Sequences Having a Polynucleotide of the Invention 

The novel polynucleotides were used to screen publicly available and proprietary 
databases to determine if any of the polynucleotides of SEQ ID NOS:1-404 would facilitate 
identification of a contiguous sequence, e.g. , the polynucleotides would provide sequence that 
would result in 5' extension of another DNA sequence, resulting in production of a longer 
contiguous sequence composed of the provided polynucleotide and the other DNA sequence(s). 
Contiging was performed using the AssemblyLign program with the following parameters: 1) 
Overlap: Minimum Overlap Length: 30; % Stringency: 50; Minimum Repeat Length: 30; 
Alignment: gap creation penalty: 1.00, gap extension penalty: 1 .00; 2) Consensus: % Base 
designation threshold: 80. 

Using these parameters, 44 polynucleotides provided contiged sequences. These 
contiged sequences are provided as SEQ ID NOS:801-844. The contiged sequences can be 
correlated with the sequences of SEQ ID NOS: 1-404 upon which the contiged sequences are 
based by identifying those sequences of SEQ ID NOS: 1-404 and the contiged sequences of SEQ 
ID NOS:801-844 that share the same clone name in Table 1 . It should be noted that of these 44 
sequences that provided a contiged sequence, the following members of that group of 44 did not 
contig using the overlap settings indicated in parentheses (Stringency/Overlap): SEQ ID 
NO:804 (30%/10); SEQ ID NO:810 (20%/20); SEQ ID NO:812 (30%/10); SEQ ID NO:814 
(40%/20); SEQ ID NO:816 (30%/10); SEQ ID NO:832 (30%/10); SEQ ID NO:840 (20%/20); 
SEQ ID NO:841 (40%/20). To generalize, the indicated polynucleotides did not contig using a 
minimum 20% stringency, 10 overlap. There was a corresponding increase in the number of 
degenerate codons in these sequences. 

The contiged sequences (SEQ ID NO:801-844) thus represent longer sequences that 
encompass a polynucleotide sequence of the invention. The contiged sequences were then 
translated in all three reading frames to determine the best alignment with individual sequences 
using the BLAST programs as described above for SEQ ID NOS: 1-404 and the validation 
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sequences SEQ ID NOS .405-800. Again the sequences were masked using the XBLAST 
profram for masking low complexity as described above in Example 1 (Table 2). Several of the 
contiged sequences were found to encode polypeptides having characteristics of a polypeptide 
belonging to a known protein families (and thus represent new members of these protein 
5 families) and/or comprising a known functional domain (Table 16). Thus the invention 
encompasses fragments, fusions, and variants of such polynucleotides that retain biological 
activity associated with the protein family and/or functional domain identified herein. 



Table 16. Profile hits using contiged sequences 



, .. 

r'"!i . 


SEQ 
ID 


Sequence Name 


Profile 


Start 
(Stop) 


Score 


r j 


NO. 










« *i 

*\\\ 

t 'A 1" 


809 


Contig RTA00000177AF.n.l8.3. 
Seq_THC 123051 


ATPases 


778 
(1612) 


6040 


lis 

yi 


824 


Contig RTA00000187AF.g.24.1. 
Seq_THC168636 


homeobox 


531 
(707) 


12080 


A *»' t» 

r*. r 

£ . 


824 


Contig RTA00000187AF.g.24.1. 
Seq_THC 168636 


MAP kinase 
kinase 


769 
(1494) 


5784 


i ""i -• 

s ■ 

h* 


833 


Contig RTA00000190AF.j.4.1. 
Seq_THC228776 


protein kinase 


170 
(1010) 


5027 


j a? 

sH 

3 5 


833 


Contig RTA00000190AF.j.4.1. 
Seq_THC228776 


protein kinase 


170 
(1010) 


5027 



* 4 10 All stop/start sequences are provided in the forward direction. 



The profiles for the ATPases (AAA) and protein kinase families are described above in 
Example 2. The homeobox and MAP kinase kinase protein families are described further 
below. 

15 Homeobox domain. The 'homeobox' is a protein domain of 60 amino acids (Gehring In: 

Guidebook to the Homeobox Genes , Duboule D., Ed., ppl-10, Oxford University Press, 
Oxford, (1994); Buerglin In: Guidebook to the Homeobox Genes , pp25-72, Oxford University 
Press, Oxford, (1994); Gehring Trends Biochem. Set (1992) 77:277-280; Gehring et al Anna. 
Rev. Genet (1986) 20:147-173; Schofield Trends Neuroscl (1987) 70:3-6; 

20 http://copan.bioz.unibas.ch/ homeo.html) first identified in number of Drosophila homeotic and 

segmentation proteins. It is extremely well conserved in many other animals, including 

vertebrates. This domain binds DNA through a helix-turn-helix type of structure. Several 

proteins that contain a homeobox domain play an important role in development. Most of these 
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proteins are sequence-specific DNA-binding transcription factors. The homeobox domain is 
also very similar to a region of the yeast mating type proteins. These are sequence-specific 
DNA-binding proteins that act as master switches in yeast differentiation by controlling gene 

expression in a cell type-specific fashion. 
5 A schematic representation of the homeobox domain is shown below. The helix-turn- 

helix region is shown by the symbols f H' (for helix), and *t' (for turn). 

xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxHHHHHHHHtttHHHHHHHHHxxxxxxxxxx 
1 60 

10 

. The pattern detects homeobox sequences 24 residues long and spans positions 34 to 57 of the 

O homeobox domain. The consensus pattern is as follows: [LIVMFYG]-[ASLVR]-x(2> 
Sj [LIVMSTACN]-x-[LIVM]-x(4)-[LIV]-[RKNQESTAIY]-[LIVFSTNKH]-W-[FYVC]-x- 
| [NDQTAH]-x(5)-[RKNAIMW] . 

III 15 MAP kinase kinase fMAPKK). MAP kinases (MAPK) are involved in signal 

HI 

I : transduction, and are important in cell cycle and cell growth controls. The MAP kinase kinases 

w (MAPKK) are dual-specificity protein kinases which phosphorylate and activate MAP kinases. 

ni 

H MAPKK homologues have been found in yeast, invertebrates, amphibians, and mammals. 

* *** 

S Moreover, the MAPKK/MAPK phosphorylation switch constitutes a basic module activated in 
HI 20 distinct pathways in yeast and in vertebrates. MAPKK regulation studies have led to the 

discovery of at least four MAPKK convergent pathways in higher organisms. One of these is 
similar to the yeast pheromone response pathway which includes the stel 1 protein kinase. Two 
other pathways require the activation of either one or both of the serine/threonine kinase- 
encoded oncogenes c-Raf-1 and c-Mos. Additionally, several studies suggest a possible effect 
25 of the cell cycle control regulator cyclin-dependent kinase 1 (cdc2) on MAPKK activity. 

Finally, MAPKKs are apparently essential transducers through which signals must pass before 
reaching the nucleus. For review, see, e.g., Biologique Biol Cell (1993) 79:193-207; Nishida et 
aU Trends Biochem Sci (1993) 75:128-31; Ruderman Curr Opin Cell Biol (1993) 5:207-13; 
Dhanasekaran et al , Oncogene (1 998) 1 7: 1447-55; Kiefer et al , Biochem Soc Tram (1 997) 
30 25:491-8; and Hill, Cell Signal (1996) 5:533-44. 
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Those skilled in the art will recognize, or be able to ascertain, using not more than 
routine experimentation, many equivalents to the specific embodiments of the invention 
described herein. Such specific embodiments and equivalents are intended to be encompassed 

by the following claims. 

All publications and patent applications cited in this specification are herein incorporated 
by reference as if each individual publication or patent application were specifically and 
individually indicated to be incorporated by reference. The citation of any publication is for its 
disclosure prior to the filing date and should not be construed as an admission that the present 
invention is not entitled to antedate such publication by virtue of prior invention. 

Although the foregoing invention has been described in some detail by way of 
illustration and example for purposes of clarity of understanding, it is readily apparent to those 
of ordinary skill in the art in light of the teachings of this invention that certain changes and 
modifications may be made thereto without departing from the spirit or scope of the appended 
claims. 

Deposit Information : 



The following materials were deposited with the American Type Culture Collection: 
CMCC = (Chiron Master Culture Collection) 

Cell Lines Deposited with ATCC 



Cell Line 


Deposit Date 


ATCC Accession No. 


CMCC Accession No. 


KM12L4-A 


March 19, 1998 


CRL-12496 


11606 


Kml2C 


May 15, 1998 


CRL-12533 


11611 


MDA-MB-231 


May 15, 1998 


CRL-12532 


10583 


MCF-7 


October 9, 1998 


CRL-12584 


10377 
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CDNA Library Deposits 



cDNA Library ESI - ATCC# 
Deposit Date - December 22, 1998 





Cluster ID 


Seauence Name 


M00001 39^A*C03 


4016 


79 Al SD6:130016.Seq 




4016 


RTA00000118A c.4.1 

IV A i *V V/ V V A JL JL • V • t m X 


MftflOO 1 449 A *D 1 2 


3681 


RTA0OO00 1 3 1 A. g. 1 5 .2 


MAft0ni44QA-ni? 


3681 


79 El sn6* 130064 Sea 




1 120 


79 C2 sd6'130041 Sea 


MUUUU14jz/\.1JUo 


1 190 

X 1 Z.VJ 


RTA000001 IRA n H 3 


M00001513A:B06 


4568 


79.D4.sp6:130055.Seq 


M00001513A:B06 


4568 


RTA00000122A.d.l5.3 


M00001517A:B07 


4313 


79.F4.sp6:130079.Seq 


M00001517A:B07 


4313 


RTA00000122A.n.3.1 


M00001533A:C11 


2428 


RTA00000123A.1.21.1 


M00001533A:C11 


2428 


79A5.sp6:130020.Seq 


M00001533A:C11 


2428 


RTA00000123 A.1.21 . 1 .Seq_THC205063 


M00001542A:A09 


22113 


79.F5.sp6:130080.Seq 


M00001542A:A09 


22113 


RTA00000125A.C.7.1 
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cDNA Library ES2 - ATCC# 
Deposit Date - December 22, 1998 



Llone Name 


Pliwtpr TD 


Spnnpnrp Name 


"\vfAAAA1 lAlf^-V 1 A 

MUUUU 1343Cr 1U 


9700 

Z / "v 


RO Fl sn6- 130256 Sea 


MUUOU 1 343Cr 1 U 


97QA 

Z /7U 


RTAAAA001 77 AF e 2 1 Sea THC229461 


\ yfAAAA 1 9 /19/^«1? 1 A 

MUUUU I 3 43 l.r 1U 


97Q0 

Z 1 7U 


RTAOn000177AF e 2 1 


MUUUU 1 343 D: JHU / 


9^955 

ZJZ.J J 


100n snfv 131446 Sen 


\ /taaaa 1 1 /i rvt-TA'7 
MUUUU 1 343 D.riU / 


9^9^ 


RTA00000177AF e 14 3 Sea THC228776 


A ffAAAA 1 1 A1T\.Uf\H 

M0UUU1343D:liU / 


Z3ZJ 3 


RA F1 cn£-n0?68 Spa 


\;TAAAA1 9 /19i^.LIA'7 

MUUUU 1 343 JJ.riU / 


ZjZJ J 


RTA000001 77 AF e 1 4 3 


A if A AAA 1 1 AC A .TA1 

MUUUU 1 345A:bU 1 


£A9A 

04ZU 


179 Fl «nrv1T!Q?S <spa 


X jTAAAA 1 "5 AC A .~C A 1 

MUUUU 1 34 j A.JbU I 


AA9A 


rta AAA001 77 AF f 1 0 ^ 


X fAAAA 11/1CA.CA1 

M00001345A:bUl 


04ZU 


RTAAAAAA1 77 A F f 10 ^ Sen THC226443 

lvl/\UUvUUl / l r\r .1. 1U.J.OCLJ 1 1 iv^LLUt'r J 


MOUUU 1 345 A:bU 1 


04ZU 


RA m c«6* 1 1A98A <\pn 


X /fAAAA IT/nA ."O 1 A 

MUUUU134/A:jd1U 


1 J J /0 


8 A H9 «nfv1 1A94S Spa 


X / A AAA 1 O 4 ^7 A .T* 1 A 

M00001347A:B1U 


1 9 £9£ 
13j /O 


1 AA V 1 er\A* 1^1 A7A \aa 


X O yin A . T"> 1 A 

M00001347A:B1U 


1 9^7£ 
13D /D 


PTA AAAAA1 77AF cr 1 6 1 


■» /AAAA1 1 TO A . 1 O 

M0000l353A:G12 


oU /o 


OA ct\A'11A9^R Qpa 
6U.J2r3.SpO. I jUZjO.octJ 


X /T A A A A 1 A . 1 O 

M0000l353A:G12 


oU /o 


PTA AAAAA1 77 AR 1 1^ 1 


M0000l353A:G12 


oU/o 


l /z.v^j.spo. i3jyu3.oeq 


M00001353D:D10 


14929 


RTA00000177AF.m.l.2 


M00001353D:DlO 


14929 


80.F3.sp6:130270.Seq 


M0000l353D:D10 


14929 


172.D3.sp6:133915.Seq 


M0000l36lA:A05 


4141 


80.B4.sp6:l 30223. Seq 


M0000l361A:A05 


4141 


RTA00000177AF.p.20.3 


M0000l362B:D10 


5622 


80.D4.sp6:130247.Seq 


M00001362B:DlO 


5622 


RTA00000178AF.a.ll.l 



129 



cDNA Library ES3 - ATCC# 
Deposit Date - December 22, 1998 



L/IOUC INdlllC 


plncter ID 

VlUjlWl 11-/ 


Spnnpnpp Nam? 


MAAAA1 Ifi^P-HI 1 


945 


RTA000001 78 AR a 20 1 


A/fAAAAl 1£7P'H1 1 
MUUUU1 jDZU.ril 1 




1 00 F4 Qnfv 1 1 1 471 Sen 


AyfAAAAl ^69P'141 1 


Q45 


80 F4 snfv 110259 Sea 




Q45 


180 C2 sn6- 115940 Sea 


MAAAA1 ^76R«fi06 


17712 


RTA000001 78AR i 2 2 


TVTAAAA1 176R-HA6 


17712 


80 R5 sn6* 110224 Sea 


TVyf A AAA nC7A -PA^ 
MUUUU1 30 //\.v^UJ 




SO Dfi snfv 1 1024Q Sen 


TV/TAAAA1 «PA^ 




RTA000001 7RAF n 1 R 1 
xv i t\\j\j\j\j\) i / o/A.r .ii. i o. l 


\A AAAA1 A 1 A 
IVIUUU U 1 *t 1 Z Jt> . D 1 u 




RTA000001 7QAF n 21 1 


\/fAAAA1 A1 A 


8551 


80 G7 snfv 110286 Sen 


MUOUO 1 4 1 jA:HUo 




oU.i5o.spo. l juzz /.L>eq 


M00001415A:H06 


13538 


RTA00000180AF.a.24.1 


M00001416B:H11 


8847 


80.C8.sp6:130239.Seq 


M00001416B:H11 


8847 


RTA00000180AF.b.l6.1 


M00001429D:D07 


40392 


RTA00000180AF.j.8.1 


M00001429D:D07 


40392 


80.H9.sp6:130300.Seq 


M00001448D:H01 


36313 


80.All.sp6:130218.Seq 


M00001448D:H01 


36313 


RTA00000181AF.e.23.1 
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cDNA Library ES4 - ATCC# 
Deposit Date - December 22, 1998 





Clone Name 


Cluster ID 


sequence iName 




M0000146JL:bl 1 


1 0 

iy 


12 T A AAAAA 1 h 7 1 
Ix 1 AUUUUU 1 oZ/\r .0. / . 1 




Tk if AAAA1 /J •*! „ T~> 1 1 

M00001463C:B1 1 




oy .u i .spo . i j u / u j . oeq 




M00001470A:B10 


1 A'S *7 


OA ITO fi^/;. 1 Can 

oy.rZ.spo. 1 3U /zo.oeq 




n fA A AA 1 yl^A A Tk 1 A 

M00001470A:B10 


1 A^ H 


DTAAAAAA101 A f 9 1 
K 1 AUUUUU 1 1 1 A.I.o . 1 




■» xaaaai a r\*i a . r\o 

M0000 1 497A:G02 


zoz3 


oa d onA- 1 Con 

oy,r3.spo.i3U /zy.oeq 




M0000l497A:G02 


2623 


DTP A AAAAA 1 A C ^ iC 1 

K 1 AUUUUU I o J Ar .a.o. 1 




"fc /f AAAA 1 CAA A . T~" 1 1 

MOOOOlSOOAiEll 


zoz3 


Kl AUUUUU I Ar .D. 1 4. 1 




M00001500A:E11 


2623 


OA A A n ~.£L . 1 'J A/C7 A C 

oy.A4.spo: liUo /U.oeq 




M00001501D:C02 


9685 


T> r r^ a aaaaa 1 OT A T7 « 11 1 C^^. TUT' 1 A AC A A 

RTAUU00U 1 83 AF.c. 11.1 .Seq_ 1 HC 1 UyD 44 




M00001501D:C02 


A/Of 

9685 


TiTT A AAAAA 1 O'? A T? _ 1 1 1 

R 1 AUUUUU 1 83 Ar .c. 1 i . i 




M00001501D:C02 


9685 


89.C4.sp6:130694.Seq 


* 


M00001504C:H06 


6974 


89.F4.sp6:13U73U.beq 




M00001504C:H06 


6974 


RTA00000 1 83 AF.d.9. 1 


2 it 


M0000 1 504C :H06 


oy /4 


DHT A AAAAA 1 ft 3 AC ^ 0 1 Qan 

K 1 AUUUUU I o3 Ar .u.y . 1 .oeq_ 1 txK^LLj 1 zy 


\$ ■ 


M00001504D:G06 


6420 


173.F5.SP6:134133.Seq 


y i 


M00001504D:G06 


6420 


89.G4.sp6:130742.Seq 


s fit 

i 


M00001504D:G06 


6420 


RTA00000 1 83 AF.d. 11.1 .Seq_THC226443 


*. 


M00001504D:G06 


6420 


RTA00000183AF.d.ll.l 


S s ? 

* * »; 

sir 


M00001528A:C04 


35555 


89.B6.sp6:130684.Seq 




M00001528A:C04 


7337 


RTA00000123A.b.l7.1 




M00001528A:C04 


35555 


184A5.sp6:135530.Seq 
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cDNA Library ES5 - ATCC# 
Deposit Date - December 22, 1998 



Clone JName 






maaaai ^77n-/^A7 

MUUUU 1 J J / d.vjU / 


1180 

JJ07 


dta nonon 1 af m 10 1 

ivi /\uutfuu 1 ojr\.r .in . 17.1 


A/1AAAA1 ^'27T}«/^A7 


nao 


oy./\O.SpO. 1 JUD /4,ocCJ 


\/t aaaai ^/i 1 a 'F*a7 
MUUUU 1 J4 1 A.LfUZ 






\A AAAA1 A -TW} 
MUUUU 1341 A.LH/Z 




RTAOAAAAI IS A rl 1 1 


\aaaaai ^/mr*T3 A7 
MUUUU 1 344D.OU / 


MIA 


80 AO cnA-1lA67^ Qpn 


A/JAAAA1 ^/1yf"D*"DA7 

MUUUU I j44o.r>U / 


MIA 
Oy /4 


PTAAAAAA1 8AAT <* 1^ 1 
IV 1 /\UUUUU 1 o4/\r .a. 1 D . 1 


A/1AAAA1 *\A£k'Ci\ 1 
MUUUU 1 D40A..01 1 


1 767 




\yfAAAA1 SA£ A «fi1 1 
MUUUU 1 J40/\.VJ1 1 






M00001549B:F06 


4193 


89.G9.sp6:130747.Seq 


M00001549B:F06 


4193 


RTA00000184AF.e.l3.1 


M00001556A:F11 


1577 


173.C9.SP6:134101.Seq 


M00001556A:F11 


1577 


89.Fll.sp6:130737.Seq 


M00001556A:F11 


1577 


RTA00000184AF.i.23.1 


M00001556B:C08 


4386 


RTA00000184AFJ.4.1 


M00001556B:C08 


4386 


89.Hll.sp6:130761.Seq 
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cDNA Library ES6 - ATCC# 
Deposit Date - December 22, 1998 



I r\Y\ ok "rVl o tn o» 

V^IOnc lNalilC 


Cluster TD 


Sennence Name 


\4aaaai *\6ivi*vf\f\ 
MUUUU 1 jOJD.ruo 


10? 

1 \JjL 


RTA000001 84AF n 5 1 


iv/rnoon i sa^r-foa 

IViUUUU 1 JUJD.rUO 


102 


90 Bl sd6 130871 Sea 


a/iaaaa i ^71 r*'i4(\£ 
MUUUU 1 j / Iv^.rlUO 




90 F1 <snfvH 0907 Sen 


mooaai S71 p-hoa 


5749 


RTA00000185AF a 19 1 


atiaaaai ^q/tr-waj. 

MUUUU 1 J y^D. xIUh 




Q0 D9 qn6*nOXQ6 Sen 


\/taaaai <Q/H3-trAi<i 
MUUUU i jy4D.riU4 


960 


RTA0000018SAR t 19 9 


a/taaaai ^Q7r i »WA9 

MUUUU Ijy /L/.ilUZ 


4817 


90 F? sn6- 110908 Sea 


AyfAAAAl co7r -, «TJA7 
MUUUU i jy /U.riUz 


ZLJH7 
*r 0 J i 


RTA000001 AR H 9 


a^taaaai Ai/ir^-FHl 
MUUUU 10Z4v/.rUl 


4/3 HQ 


90 C4 snfv 110886 Sen 


lV/fAAAA1 ^AC^'VCW 

MUUUU 10Z4L/.rUI 


410Q 

T-JV7 


RTA000001 86AF e 99 1 


H AAAA 1 CHC\ A « A AA 

MUUUU 1 0 /yA. AUo 


OOOU 




\/f AAAA 1 £7Q A • A f\A 

MUUUU 1 1> /VA. AUt) 


OODu 


199 R^ cnA- 119089 Sph 


TV if AAAA 1 i£7A A • A A£ 

MUUUU i o / vA: AUo 


OOUU 


dta A00001 87 A F h 1^ 1 
iv l /\uuuuu i o //vr .n. i j . i 


MuuOuj oybibuy 


£Q7 


OA f;8 1AQ1R Qpn 

yu.vro.spo. 1 juyjo.ocq 


M00003759B:B09 


697 


RTA00000188AF.d.6.1 


M00003759B:B09 


697 


RTA00000188AF.d.6.1.Seq_THC 178884 


M00003844C:Bll 


6539 


176.D9.sp6:134556.Seq 


M00003844C:B11 


6539 


RTA00000189AF.d.22.1 


M00003844C:B11 


6539 


90.B10.sp6:130880.Seq 


M00003857A:G10 


3389 


90.All.sp6:130869.Seq 


M00003857A:G10 


3389 


RTA00000189AF.g.3.1 
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cDNA Library ES7 - ATCC# 
Deposit Date - December 22, 1998 



Clone Name 


Cluster iiJ 


oecjuence iName 


M00003 9 1 4C :r U5 


i aaa 

jyuu 


yy.n i .spo. i j iz /o.oeq 


Ti A AAAA1A1 /I . T?AC 

MUUUU3914L:rU5 


1GAA 


PTA AAAAA1 OA AT7 rr 1 ^ 1 
K 1 AUUUUU 1 y U Ar .g. 1 J . I 


H /fAAArtlA^^ A PAZ" 

M00003922A:E06 


23255 


T> TP A AAAAA 1 AA A IT I A 1 

K 1 AUUUUU 1 9UAr .J.4. 1 


Tk //\AAAO A . I - ' A /T 

M00003922A:E06 


23255 


AO CI 111 OAA C 

99. r I.spo: 131Z9U.k>eq 


M00003922A:E06 


23255 


OT* A AAAAA1 AA A 17 ! /I 1 O^r-i TUfTO QHH£ 

K 1 AUUUUU 1 9UAr J.4. 1 .oeq_ 1 HUZZa / /o 


"X /AAAA1 AOI A . A A^ 

M00003983A:A05 


a 1 ac 
y 1U5 


yy.C3.spo: i J iZ5o.beq 


A i AAAA1 AOO A A AT 

M00003983A:A05 


A1 AC 

91U5 


DTAAAAAA1A1 A I? n Ol O 

K 1 AUUUUU 1 9 1 Ar .a.Z 1 .z 


M00004028D:A06 


6124 


T> T A A A AAA 1A1 AD ~ O 1 

Kl AUUUUU191AR.e.2.3 


M00004028D:A06 


6124 


99.U3.spo:1312oo.v5eq 


M00004031A:A12 


9061 


t* A r»/\r\r\Ai A1 a r> » i i o 

RTA000001 9 1 AR.e. 1 1 .2 


MQ000403lA:Al2 


aazt i 


T> HT A A AAAA 1A1 AO n 11 ^ 

K 1 AUUUUU 191 AK.e. 1 1 .3 


M00004087D:A0l 


6880 


RTA00000191AF.m.20.1 


M00004087D:AOl 


6880 


99.A5.sp6:131234.Seq 


M00004l08A:E06 


4937 


99.E5.sp6: 13 1282. Seq 


M00004l08A:E06 


4937 


RTA00000191AF.p.21.1 


M00004ll4C:Fll 


13183 


123.D5.sp6:132305.Seq 


M00004ll4C:Fll 


13183 


RTA00000192AF.a.24.1 


M00004ll4C:Fll 


13183 


99.G5.sp6:131306.Seq 
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cDNA Library ES8 - ATCC# 
Deposit Date - December 22, 1998 



Clone Name 


Plnctrar TF> 




M000U414oL:Cl i 


jZj / 


00 <snfv 1 3 1 947 Sen 

77.DO.apO. 1 J 1 i-^t / .lJC^ 


Tk Kf\(\(\{\ At A .(~^ 1 1 

M00004146C:ul 1 


^7^7 
JZD / 


177 PS Qn6*n47fi8 Sen 


M00004146C:CI 1 


<0^7 
DZD / 




M00004146C.L11 


jZj / 


RTAfinn0010?AF f ^ 1 Sen THC213833 


M000O4157C:AO9 




PTAnnnnniQ7 af <* 71 1 

IV 1 /\UUUUU 1 7ZAr .g.^-> « 1 


M00004157C:A09 


6455 


99.D6.sp6:131271.Seq 


M00004157C:A09 


6455 


123.E7.sp6:132319.Seq 


M00004172C:D08 


11494 


RTA00000192AF.j.6.1 


M00004172C:D08 


11494 


99.G6.sp6:131307.Seq 


M00004172C:D08 


11494 


177.E6.sp6:134757.Seq 


M00004229B:F08 


6455 


RTA00000193AF.b.9.1 


M00004229B:F08 


6455 


99.C8.sp6:131261.Seq 
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cDNA Library ES9 - ATCC# 
Deposit Date - December 22, 1998 



Clone Name 

M00001466A:E07 

M00001531A.-H11 

M00001531A:H11 

M00001551A:B10 

M00001551A:B10 

MO0OO1551A:B10 

M00001552A:B12 

M00001552A:B12 

M00001556A:H01 

M00001586C:C05 

M00001604A:B10 

M00001604A:B10 

M00003879B:C11 

M00003879B:C11 



Cluster ID 
4275 



6268 

6268 

6268 

307 

307 

15855 

4623 

1399 

1399 

5345 

5345 



Sequence Name 
RTA00000120A.j.l4.1 

89. F6.sp6:130732.Seq 
RTA00000123A.g.l9.1 
79.G9.sp6:130096.Seq 
184.C12.sp6:135561.Seq 
RTA00000126A.O.23.1 
RTA00000136A.O.4.2 
79.C7.sp6:130046.Seq 
RTA00000184AF.j.l.l 
RTA00000185AF.f.4.1 
79.G8.sp6:130095.Seq 
RTA00000129A.O.10.1 
RTA00000189AF.1.19.1 

90. B12.sp6:130882.Seq 
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cDNA Library ES10 - ATCC# 
Deposit Date - December 22, 1998 



L/IOnC iNdlHC 


Cluster TD 

x^IUjI^I 11-/ 








RTA00000177AF o 4 1 


IVlUv/Uu 1 jOOL/.UuJ 


5832 


80 F6 sd6* 130273 Sea 


IV1UUUU 1 jOOL'.vJUj 


5Jtt? 


RTA00000178AF o 23 1 




UJOJ 


RTA00000179AF d 13 1 




OjOj 


i /z.Do.spo. 1 J JOVO.oeCJ 


M00001394A:F01 


6583 


80.H6.sp6:130297.Seq 


M00001429A:H04 


2797 


RTA00000180AF.U9.1 


M00001447A:G03 


10717 


RTA00000181AF.d.l0.1 


M00001448D:C09 


8 


80.H10.sp6:130301.Seq 


M00001448D:C09 


8 


RTA00000181AF.e.l7.1 


M00001448D:C09 


8 


100.Bll.sp6:131444.Seq 


M00001454D:G03 


689 


RTA00000181AR.1.22.1 
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cDNA Library ES 1 1 - ATCC# 
Deposit Date - December 22, 1998 



Clone Name 

M00003975A:G11 

M00003978B:G05 

M00003978B:G05 

M00004059A:D06 

M00004068B:A01 

M00004068B:A01 

M00004205D:F06 

M00004205D:F06 

M00004205D:F06 

M00004212B:C07 

M00004223A:G10 



Cluster ID 

12439 

5693 

5693 

5417 

3706 

3706 



2379 
16918 



Sequence Name 

RTA00000190AF.O.24.1 

RTA000001 90AF.p. 1 7.2.Seq_THC 1 733 1 8 

RTA00000190AF.p.l7.2 

RTA00000191AF.h.l9.1 

99.C4.sp6:131257.Seq 

RTA00000191AF.i.l7.2 

99.E7.sp6:131284.Seq 

177.G7.sp6:134782.Seq 

RTA00000192AF.O.11.1 

RTA00000192AF.p.8.1 

RTA00000193AF.a.l6.1 
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cDNA Library ESI 2 - ATCC# 
Deposit Date - December 22, 1998 



Clone Name 

M00004223B:D09 

M00004249D:G12 

M00004251C-.G07 

M00004372A:A03 



Cluster ID 
7899 



2030 



Sequence Name 

RTA00000193AF.a.l7.1 

RTA00000193AF.C.22.1 

RTA00000193AF.d.2.1 

RTA00000193AF.m.20.1 
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cDNA Library ESI 3 - ATCC# 
Deposit Date - December 22, 1998 



Clone Name 


Cluster ID 


Sequence Name 


M00001340B:A06 


17062 


80.Al.sp6:130208.Seq 


M00001340B:A06 


17062 


RTA00000177AF.b.8.4 


M00001340D:F10 


11589 


80.Bl.sp6:130220.Seq 


M00001340D:F10 


11589 


RTA00000177AF.b.l7.4 


M00001341A:E12 


4443 


80.Cl.sp6:130232.Seq 


M00001341A:E12 


4443 


RTA00000177AF.b.20.4 


M00001342B:E06 


39805 


80.Dl.sp6:130244.Seq 


M00001342B:E06 


39805 


RTA00000177AF.C.21.3 


M00001346A:F09 


5007 


RTA00000177AF.g.2.1 


M00001346A:F09 


5007 


80.Hl.sp6:130292.Seq 


M00001346D:G06 


5779 


RTAOOO00 177 AF.g. 14.3 


M00001346D:G06 


5779 


RTA00000177AF.g.l4.1 


M00001348B:B04 


16927 


80.E2.sp6:130257.Seq 


M00001348B:B04 


16927 


RTA00000177AF.h.9.3 


M00001348B:G06 


16985 


RTA00000177AF.h.l0.1 


M00001348B:G06 


16985 


80.F2.sp6:130269.Seq 


M00001349B:B08 


3584 


RTA00000177AF.h.20.1 


M00001349B:B08 


3584 


80.G2.sp6:130281.Seq 


M00001350A:H01 


7187 


100.C2.sp6:131447.Seq 


M00001350A:H01 


7187 


80A3.sp6:130210.Seq 


M00001350A:H01 


7187 


RTA00000177AF.i.8.2 


M00001352A:E02 


16245 


RTA00000177AF.k.9.3 


M00001352A:E02 


16245 


172.D2.sp6:133914.Seq 


M00001352A:E02 


16245 


80.D3.sp6:130246.Seq 


M00001355B:G10 


14391 


RTA00000177AF.m.l7.3 


M00001355B:G10 


14391 


80.G3.sp6:130282.Seq 


M00001355B:G10 


14391 


172.H3.sp6:133963.Seq 


M00001355B:G10 


14391 


100.E3.sp6:131472.Seq 


M00001361D:F08 


2379 


80.C4.sp6:130235.Seq 


M00001361D:F08 


2379 


RTA00000178AF.a.6.1 


M00001365C:C10 


40132 


RTA00000178AF.C.7.1 


M00001365C:C10 


40132 


80.F4.sp6:130271.Seq 


M00001368D:E03 




80.G4.sp6:130283.Seq 


M00001368D:E03 




RTA00000178AF.d.20.1 


M00001370A:C09 


6867 


80.H4.sp6:130295.Seq 


M00001370A:C09 


6867 


- RTA00000178AF.e.l2.1 


M00001371C:E09 


7172 


100A5.sp6:131426.Seq 


M00001371C:E09 


7172 


RTA00000178AF.f.9.1 


M00001371C:E09 


7172 


80.A5.sp6:130212.Seq 


M00001378B:B02 


39833 


80.C5.sp6:130236.Seq 


M00001378B:B02 


39833 


RTA00000178AF.L23.1 


M00001379A:A05 


1334 


8O.D5.sp6:130248.Seq 


M00001379A:A05 


1334 


RTA00000178AF.j.7.1 


M00001380D:B09 


39886 


RTA00000178AFJ.24.1 


M00001380D:B09 


39886 


80.E5.sp6:130260.Seq 


M00001381D:E06 




80.F5.sp6:130272.Seq 


M00001381D:E06 




RTA00000178AF.L16.1 


M00001382C:A02 


22979 


80.G5.sp6:130284.Seq 
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cDNA Library ESI 3 - ATCC# 
Deposit Date - December 22, 1998 



Clone Name 

M00001382C:A02 

M00001384B:A1 1 

M00001384B:A11 

M00001386QB12 

M00001386C:B12 

M00001387B:G03 

M00001387B:G03 

M00001389A:C08 

M00001389A:C08 

M00001396A:C03 

M00001396A:C03 

M00001396A:C03 

M00001400B:H06 

M00001400B:H06 

M00001400B:H06 

M00001400B:H06 

M00001402A:E08 

M00001402A:E08 

M00001407B:D11 

M00001407B:D11 

M00001410A:D07 

M00001410A:D07 

M00001410A:D07 

M00001414A:B01 

M00001414A:B01 

M00001414C-.A07 

M00001414C:A07 

M00001416A:H01 

M00001416A:H01 

M00001417A:E02 

M00001417A:E02 

M00001423B:E07 

M00001423B-.E07 

M00001424B:G09 

M00001424B:G09 

M00001425B:H08 

M00001425B:H08 

M00001426B:D12 

M00001426B:D12 

M00001426D:C08 

M00001426D:C08 

M00001428A:H10 

M00001428A:H10 
M00001428A:H10 
M00001449A:A12 
M00001449A:A12 
M00001449A:B12 
M00001449A:B12 



Cluster ID 
22979 



5178 

5178 

7587 

7587 

16269 

16269 

4009 

4009 

4009 



39563 

39563 

5556 

5556 

7005 

7005 

7005 



7674 

7674 

36393 

36393 

15066 

15066 

10470 

10470 

22195 

22195 



4261 
4261 

84182 

84182 

84182 

5857 

5857 

41633 

41633 



Sequence Name 

RTA00000178AF.k.22.1 

80.B6.sp6:130225.Seq 

RTA00000178AF.m.l3.1 

80.C6.sp6:130237.Seq 

RTA00000178AF.n.l0.1 

80.E6.sp6:130261.Seq 

RTA00000178AF.n.24.1 

RTA00000178AF.p.l.l 

8O.G6.sp6:130285.Seq 

172.D8.sp6:133920.Seq 

80.A7.sp6:130214.Seq 

RTA00000179AF.e.20.1 

172.B9.sp6:133897.Seq 

80.B7.sp6:130226.Seq 

RTA00000179AF.j.l3.1 

RTA00000 1 79AF J.13.1 .Seq_THC 1 05720 

80.C7.sp6:130238.Seq 

RTA00000179AF.k.20.1 

RTA00000179AF.n.l0.1 

80.D7.sp6:130250.Seq 

180.H5.sp6: 136003 .Seq 

RTA00000179AF.O.22.1 

80.F7.sp6:130274.Seq 

RTA00000180AF.a.9.1 

80.H7.sp6:130298.Seq 

80.A8.sp6:130215.Seq 

RTA00000180AF.a.ll.l 

79. Cl.sp6:130040.Seq 
RTA00000118A.g.9.1 
RTA00000180AF.C.2.1 

80. D8.sp6: 130251. Seq 
RTA00000180AF.e.24.1 
80.H8.sp6:130299.Seq 
80.A9.sp6:130216.Seq 
RTA00000180AF.f.l8.1 
RTA00000180AF.g.7.1 
80.B9.sp6:130228.Seq 
RTA00000180AF.g.22.1 
80.C9.sp6:130240.Seq 
80.D9.sp6:130252.Seq 
RTA00000180AF.h.5.1 
100.G9.sp6:131502.Seq 

RTA00000180AF.h.l9.1 

80.E9.sp6:130264.Seq 

80.Bll.sp6:130230.Seq 

RTA00000118A.g.l4.1 

80.Cll.sp6:130242.Seq 

RTA00000118A.g.l6.1 
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cDNA Library ESI 3 - ATCC# 
Deposit Date - December 22, 1998 



Clone Name 


Cluster ID 


Sequence Name 


M00001449A:G10 


36535 


RTA00000181AF.f.5.1 


M00001449A:G10 


36535 


80.Dll.sp6:130254.Seq 


M00001449A:G10 


36535 


100.Dll.sp6: 13 1468.Seq 


M00001449C:D06 


86110 


RTA00000181AF.f.l2.1 


M00001449C:D06 


86110 


80.Ell.sp6:130266.Seq 


M00001450A:A02 


39304 


RTA00000 1 1 8A.j.2 1 . 1 .Seq_THC 1 5 1 859 


M00001450A:A02 


39304 


RTA00000118A.j.21.1 


M00001450A:A02 


39304 


79.Fl.sp6:130076.Seq 


M00001450A:A02 


39304 


180.G9.sp6:135995.Seq 


M00001450A:A11 


32663 


80.Fll.sp6:130278.Seq 


M00001450A:A11 


32663 


RTA00000118A.1.8.1 


M00001450A:B12 


82498 


100.Fll.sp6:131492.Seq 


M00001450A:B12 


82498 


RTA00000118A.m.l0.1 


M00001450A:B12 


82498 


79.Gl.sp6:130088.Seq 


M00001450A:D08 


27250 


80.Gll.sp6:130290.Seq 


M00001450A:D08 


27250 


180.B10.sp6:135936.Seq 


M00001450A:D08 


27250 


RTA00000181AF.g.l0.1 


M00001452A:B04 


84328 


RTA00000118A.p.l0.1 


M00001452A:B04 


84328 


79.A2.sp6: 13001 7.Seq 


M00001452A:B12 


86859 


RTA00000118A.p.8.1 


M00001452A:B12 


86859 


79.B2.sp6:130029.Seq 


M00001452A:F05 


85064 


RTA00O00131A.m.23.1 


M00001452A:F05 


85064 


79.D2.sp6:130053.Seq 


M00001452C:B06 


16970 


80.Hll.sp6:130302.Seq 


M00001452C:B06 


16970 


100.C12.sp6:131457.Seq 


M00001452C:B06 


16970 


RTA00000181AR.U8.2 


M00001453A-.E11 


16130 


80.A12.sp6:130219.Seq 


M00001453A:E11 


16130 


100.D12.sp6: 13 1469.Seq 


M00001453A:E11 


16130 


RTA00000119A.C.13.1 


M00001453C:F06 


16653 


80.B12.sp6:130231.Seq 


M00001453C:F06 


16653 


RTA00000181AF.k.5.3 


M00001454A:A09 


83103 


RTA00000119A.e.24.2 


M00001454A:A09 


83103 


79.G2.sp6:130089.Seq 


M00001454B-.C12 


7005 


121.Dl.sp6:131917.Seq 


M00001454B:C12 


7005 


RTA00000181AF.k.24.1 


M00001454B:C12 


7005 


80.C12.sp6:130243.Seq 


M00001455B:E12 


13072 


80.F12.sp6:130279.Seq 


M00001455B:E12 


13072 


RTA00000181AR.m.5.2 


M00001460A:F06 


2448 


89.Al.sp6:130667.Seq 


M00001460A:F06 


2448 


RTA00000119A.j.21.1 


M00001461A:D06 


1531 


89.Cl.sp6: 13069 l.Seq 


M00001461A:D06 


1531 


RTA00000119A.O.3.1 


M00001465A:B11 


10145 


79.F3.sp6:130078.Seq 


M00001465A:B11 


10145 


RTA00000120A.g.l2.1 


M00001467A.-B07 


38759 


89.Fl.sp6:130727.Seq 


M00001467A:B07 


38759 


RTA00O00120A.m.l2.3 


M00001467A:D04 


39508 


RTA00000120A.O.2.1 


M00001467A:D04 


39508 


89.Gl.sp6:130739.Seq 
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cDNA Library ES 1 3 - ATCC# 
Deposit Date - December 22, 1998 



Clone Name 

M00001467A:E10 

M00001467A:E10 

M00001468A:F05 

M00001468A:F05 

M00001469A:A01 

M00001469A:A01 

M00001469A:C10 

M00001469A:C10 

M00001469A:H12 

M00001469A:H12 

M00001470A:C04 

M00001470A:C04 

M00001471A:B01 

M00001471A:B01 

M00001487B:H06 

M00001487B-.H06 

M00001488B:F12 

M00001488B:F12 

M00001494D:F06 

M00001494D:F06 

M00001499B:A11 

M00001499B:A11 

M00001499B:A11 

M00001500A.-C05 

M00001500A:C05 

M00001504A:E01 

M00001504A:E01 

M00001504A:E01 

M00001504C:A07 

M00001504C:A07 

M00001505C:C05 

M00OO1505C:CO5 

M00001506D:A09 

M00001506D:A09 

M00001506D.-A09 

M00001507A:H05 

M00001507A:H05 

M00001535A-.F10 

M00001535A.-F10 

M00001541A:H03 

M00001541A:H03 

M00001544A:G02 

M00001544A:G02 

M00001545A:D08 

M00001545A-.D08 

M00001551A:F05 

M00001551A:F05 

M00001552A:D11 



Cluster ID 

39442 

39442 

7589 

7589 



12081 
12081 
19105 
19105 
39425 
39425 
39478 
39478 



7206 

7206 

10539 

10539 

10539 

5336 

5336 



10185 
10185 



39168 
39168 
39423 
39423 
39174 
39174 
19829 
19829 
13864 
13864 
39180 
39180 
39458 



Sequence Name 

89.A2.sp6:130668.Seq 

RTA00000120A.O.21.1 

RTA00000120A.p.23.1 

89.B2.sp6:130680.Seq 

RTA00000121A.C.10.1 

89.C2.sp6:130692.Seq 

89.D2.sp6:130704.Seq 

RTA00000133A.d.l4.2 

89.E2.sp6:130716.Seq 

RTA00000133A.e.l5.1 

89.G2.sp6:l 30740. Seq 

RTA00000133A.f.l.l 

89.H2.sp6:130752.Seq 

RTA00000133A.i.5.1 

RTA00000182AF.1.15.1 

89.B3.sp6: 130681. Seq 

RTA00000182AF.1.20.1 

89.C3.sp6:130693.Seq 

RTA00000182AF.O.15.1 

89.E3.sp6:130717.Seq 

RTA00000183AF.a.24.1 

89.G3.sp6:130741.Seq 

173.B5.SP6:134085.Seq 

RTA00000183AF.b.l3.1 

89.H3.sp6:130753.Seq 

RTA00000183AF.C.24.1 

89.D4.sp6:130706.Seq 

RTA00000183AF.c.24.1.Seq_THC125912 

RTA00000183AF.d.5.1 

89.E4.sp6:130718.Seq 

89.H4.sp6:130754.Seq 

RTA00000183AF.e.l.l 

89.A5.sp6:130671.Seq 

RTA00000183AF.e.23.1 

121.G6.sp6:131958.Seq 

RTA00000121A.1.10.1 

89.B5.sp6:130683.Seq 

79.C5.sp6:130044.Seq 

RTA00000134A.k.22.1 

79.E5.sp6:130068.Seq 

RTA00000124A.n.l3.1 

79.H5.sp6:130104.Seq 

RTA00000125A.h.24.4 

RTA00000125A.m.9.1 

79.B6.sp6: 130033. Seq 

RTA00000126A.n.8.2 

79.A7.sp6:130022.Seq 

RTA00000126A.p.l5.2 
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cDNA Library ESI 3 - ATCC# 

Deposit Date - December 22, 1 998 

Clone Name Cluster ID Sequence Name 

M00001552A:D11 39458 79.D7.sp6:130058.Seq 

M00001557A:F03 39490 RTA00000128A.b.4.1 
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cDNA Library ES 14 - ATCC# 
Deposit Date - December 22, 1998 



Clone Name 

M00001511A:H06 

M00001511A:H06 

M00001512A:A09 

M00001512A:A09 

M00001512D:G09 

M00001512D:G09 

M00001512D:G09 

M00001513B:G03 

MO0001513B:GO3 

M00001513B:G03 

M00001513C:E08 

M00001513C:E08 

M00001514C:D11 

M00001514C:D11 

M00001514C:D11 

M00001518C:B11 

M00001518C:B11 

M00001528B:H04 

M00001528B:H04 

M00001531A:D01 

M00001531A:D01 

M00001534A:C04 

M00001534A.-C04 

M00001534A:D09 

M00001534A:D09 

M00001534C:A01 

M00001534C:A01 

M00001535A:C06 

M00001535A:C06 

M00001535A:C06 

M00001536A:B07 

M00001536A:B07 

M00001537A:F12 

M00001537A:F12 

M00001540A:D06 

M00001540A:D06 

M00001542A:E06 

M00001542A:E06 

M00001544A:E06 

M00001544A:E06 

M00001544A:E06 

M00001545A:B02 

M00001545A:B02 

M00001548A:E10 

M00001548A:E10 

M00001548A:E10 

M00001549C:E06 

M00001549C:E06 



Cluster ID 

39412 

39412 

39186 

39186 

3956 

3956 

3956 



14364 

14364 

40044 

40044 

40044 

8952 

8952 

8358 

8358 

38085 

38085 

16921 

16921 

5097 

5097 

4119 

4119 

20212 

20212 

20212 

2696 

2696 

39420 

39420 

8286 

8286 

39453 

39453 



5892 

5892 

5892 

16347 

16347 



Sequence Name 

RTA00000133A.k.l7.1 

89.C5.sp6:130695.Seq 

89.D5.sp6:130707.Seq 

RTA00000121A.p.l5.1 

89.E5.sp6:130719.Seq 

173.H5.SP6:134157.Seq 

RTA00000183AF.g.3.1 

RTA00000183AF.g.9.1 

89.F5.sp6: 13073 l.Seq 

RTA000001 83 AF.g.9.1.Seq_THC 198280 

RTA00000183AF.g.l2.1 

89.G5.sp6:130743.Seq 

RTA00000183AF.g.22.1 

RTA00000183AF.g.22.1.Seq_THC232899 

89.H5.sp6:130755.Seq 

89.A6.sp6:130672.Seq 

RTA00000183AF.h.l5.1 

89.D6.sp6:130708.Seq 

RTA00000183AF.i.5.1 

RTA00000123A.e.l5.1 

89.E6.sp6:130720.Seq 

RTA00000183AF.k.6.1 

89.H6.sp6:130756.Seq 

RTA00000134A.k.l.l 

RTA00000 1 34A.L 1 . 1 .Seq_THC2 1 5 869 

RTA00000183AF.k.l6.1 

89.C7.sp6:130697.Seq 

89.E7.sp6: 13072 l.Seq 

RTA00000 1 3 4A. 1.22. 1 . Seq_THC 1 2 823 2 

RTA00000134A.1.22.1 

RTA00000134A.m.l3.1 

89.F7.sp6:130733.Seq 

89.H7.sp6:130757.Seq 

RTA00000134A.O.23.1 

89.B8.sp6:130686.Seq 

RTA00000183AF.O.1.1 

89.E8.sp6:130722.Seq 

RTA00000135A.g.ll.l 

RTA00000184AF.a.8.1 

173.G7.SP6:134147.Seq 

89.H8.sp6:130758.Seq 

89.B9.sp6:130687.Seq 

RTA00000135A.1.2.2 

89.E9.sp6:130723.Seq 

RTA00000184AF.d.ll.l 

RTA00000184AF.d.l 1 .1 .Seq_THC161 896 

89.H9.sp6:130759.Seq 

RTA00000184AF.e.l5.1 
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cDNA Library ESI 4 - ATCC# 
Deposit Date - December 22, 1 998 



Clone Name 


Cluster ID 


Sequence Name 


M00001550A:A03 


7239 


89A10.sp6:130676.Seq 


M00001550A.-A03 


7239 


RTA00000126A.m.4.2 


M00001550A:G01 


5175 


RTA00000184AF.O.1 


M00001S50A:G01 


5175 


89.B10.sp6:130688.Seq 


M00001551A:G06 


22390 


RTA00000136A.j.l3.1 


M00001551A:G06 


22390 


89.C10.sp6:130700.Seq 


M00001551C:G09 


3266 


RTA00000184AR.g.l.l 


M00001551C:G09 


3266 


89.D10.sp6:130712.Seq 


M00001553A:H06 


8298 


RTA00000127A.d.l9.1 


M00001553A:H06 


8298 


89.G10.sp6:130748.Seq 


M00001553B:F12 


4573 


89.H10.sp6:130760.Seq 


M00001553B:F12 


4573 


RTA00000184AF.h.9.1 


M00001555A:B02 


39539 


RTA00000127A.i.21.1 


M00001555A-.B02 


39539 


89.Bll.sp6:130689.Seq 


M00001555A:C01 


39195 


89.Cll.sp6: 130701. Seq 


M00001555A:C01 


39195 


RTA00000137A.C.16.1 


M00001555D:G10 


4561 


RTA00000184AF.i.21.1 


M00001555D:G10 


4561 


89.Dll.sp6:130713.Seq 


M00001556A:C09 


9244 


89.Ell.sp6:130725.Seq 


M00001556A:C09 


9244 


RTA00000127A.1.3.1 


M00001556B:G02 


11294 


RTA00000184AF.j.6.1 


M00001556B:G02 


11294 


89.A12.sp6:130678.Seq 


M00001557B:H10 


5192 


173.E9.SP6:134125.Seq 


M00001557B:H10 


5192 


RTA00000184AF.k.2.1 


M00001557B:H10 


5192 


89.D12.sp6:130714.Seq 


M00001557D:D09 


8761 


RTA00000184AF.k.l2.1 


M00001557D:D09 


8761 


89.E12.sp6:130726.Seq 


M00001558B:H11 


7514 


RTA00000184AF.k.21.1 


M00001558B:H11 


7514 


89.G12.sp6:130750.Seq 


M00001559B:F01 




89.H12.sp6:130762.Seq 


M00001559B:F01 




RTA00000184AF.1.11.1 


M00001560D:F10 


6558 


90.Al.sp6:130859.Seq 


M00001560D:F10 


6558 


RTA00000184AF.m.21.1 


M00001566B:D11 




RTA00000184AF.p.3.1 


M00001566B:D11 




90.Dl.sp6:130895.Seq 


M00001583D:A10 


6293 


RTA00000185AF.e.ll.l 


M00001583D:A10 


6293 


90.A2.sp6:130860.Seq 


M00001590B:F03 




RTA00000185AF.g.ll.l 


M00001590B:F03 




90.C2.sp6:130884.Seq 


M00001597D:C05 


10470 


RTA00000185AF.k.6.1 


M00001597D:C05 


10470 


90.F2.sp6:130920.Seq 


M00001598A:G03 


16999 


90.G2.sp6:130932.Seq 


M00001598A:G03 


16999 


RTA00000185AF.k.9.1 


M00001601A:D08 


22794 


RTA00000138A.b.5.1 


M00001601A:D08 


22794 


90.H2.sp6:130944.Seq 


M00001607A:E11 


11465 


RTA00000185AF.m.l9.1 


M00001607A:E11 


11465 


90.A3.sp6:130861.Seq 


M00001608A:B03 


7802 


RTA00000185AF.n.5.1 
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cDNA Library ES 14 - ATCC# 
Deposit Date - December 22, 1998 

Clone Name Cluster ID 

M00001608A:B03 7802 

M00001608B:E03 22155 

M00001608B:E03 22155 
M00001608D:A11 
M00001608D:A11 

M00001614C:F10 13157 

M00001614C:F10 13157 

M00001617C:E02 17004 

M00001617C:E02 17004 

M00001619C:F12 40314 

M00001619C:F12 40314 

M00001621C:C08 40044 

M00001621C:C08 40044 

M00001621C:C08 40044 

M00001621C:C08 40044 

M00001623D:F10 13913 

M00001623D:F10 13913 
M00001632D:H07 
M00001632D:H07 
M00001632D:H07 
M00001632D:H07 

M00001644C:B07 39171 

M00001644C-.B07 39171 

M00001644C:B07 39171 

M00001645A:C12 19267 

M00001645A:C12 19267 

M00001645A:C12 19267 

M00001645A:C12 19267 

M00001648C:A01 4665 

M00001648C:A01 4665 

M00001657D:C03 23201 

M00001657D:C03 23201 

M00001657D-.F08 76760 

M00001657D:F08 76760 

M00001662C:A09 23218 

M00001662C:A09 23218 

M00001663A:E04 35702 

M00001663A:E04 35702 

M00001669B:F02 6468 

M00001669B:F02 6468 

M00001670C:H02 14367 

M00001670C:H02 14367 

M00001673C:H02 7015 

M00001673C:H02 7015 

M00001675A:C09 8773 

M00001675A:C09 8773 

M00001675A:C09 8773 

M00001676B:F05 11460 



Sequence Name 

90.B3.sp6:130873.Seq 

RTA00000185AF.n.9.1 

90.C3.sp6:130885.Seq 

RTA00000185AF.n.l2.1 

90.D3.sp6:130897.Seq 

RTA00000186AF.a.6.1 

90.E3.sp6:130909.Seq 

RTA00000186AF.b.21.1 

90.F3.sp6: 13092 l.Seq 

90.G3.sp6:130933.Seq 

RTA00000186AF.C.15.1 

RTA00000186AF.d.l.l 

RTA00000 1 86AF.d. 1 . 1 .Seq_THC232899 

90.H3.sp6:130945.Seq 

122.El.sp6: 132 12 l.Seq 

RTA00000186AF.e.6.1 

90A4.sp6:130862.Seq 

RTA00000 186AF.h.l 4. l.Seq_THC 112525 

RTA00000186AF.h.l4.1 

90.E4.sp6:130910.Seq 

176A3.sp6:134514.Seq 

RTA00000186AF.1.7.1 

90.F4.sp6:130922.Seq 

217A12.sp6:139369.Seq 

RTA00000 1 86AF.1. 1 2. 1 .Seq_THC 1 78 1 83 

176.G3.sp6:134586.Seq 

RTA00000186AF.1.12.1 

90.G4.sp6:130934.Seq 
90.H4.sp6:130946.Seq 

RTA00000186AF.m.3.1 

RTA00000187AF.a.l4.1 

90.B5.sp6:130875.Seq 

90.C5.sp6:130887.Seq 

RTA00000187AF.a.l5.1 

RTA00000187AR.C.5.2 

90.D5.sp6:130899.Seq 

9O.E5.sp6:13091 l.Seq 

RTA00000187AR.C.15.2 

90.F5.sp6:130923.Seq 

RTA00000187AF.d.l5.1 

90.G5.sp6:130935.Seq 

RTA00000187AF.e.8.1 

90.H5.sp6:130947.Seq 

RTA00000187AF.f.l8.1 

RTA00000187AF.f.24.1 

90.A6.sp6:130864.Seq 

RTA00000 1 87AF.f.24. 1 .Seq_THC220002 

RTA00000187AF.g.l2.1 
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cDNA Library ESI 4 - ATCC# 
Deposit Date - December 22 1998 



Clone Name 

M00001676B:F05 

M00001676B:F05 

M00001677D:A07 

M00001677D:A07 

M00001677D:A07 

M00001678D:F12 

M00001678D:F12 

M00001679A:F10 

M00001679A:F10 

M00001679B:F01 

M00001679B:F01 

M00001680D:F08 

M00001680D:F08 

M00001680D:F08 

M00001682C:B12 

M00001682C:B12 

M00001682C-.B12 

M00001688C:F09 

M00001688C:F09 

M00001693C:G01 

M00001693C:G01 

M00001716D:H05 

M00001716D:H05 

M00003741D:C09 

M00003741D:C09 

M00003747D:C05 

M00003747D:C05 

M00003747D:C05 

M00003747D:C05 

M00003754C:E09 

M00003754C:E09 

M00003761D:A09 

M00003761D:A09 

M00003761D:A09 

M00003762C:B08 

M00003762C:B08 

M00003762C:B08 

M00003763A:F06 

M00003763A:F06 

M00003774C:A03 

M00003774C:A03 

M00003774C:A03 

M00003784D:D12 

M00003784D:D12 

M00003839A:D08 

M00003839A:D08 

M00003851B:D08 

M00003851B:D08 



Cluster ID 

11460 

11460 

7570 

7570 

7570 

4416 

4416 

26875 

26875 

6298 

6298 

10539 

10539 

10539 

17055 

17055 

17055 

5382 

5382 

4393 

4393 

67252 

67252 

40108 

40108 

11476 

11476 

11476 

11476 



17076 

17076 

17076 

3108 

3108 

67907 

67907 

67907 



7798 
7798 



Sequence Name 

90.B6.sp6:130876.Seq 

219.F2.sp6:139035.Seq 

90.D6.sp6:130900.Seq 

RTA00000187AF.g.24.1 

RTAOOOOO 1 87AF.g.24. 1 .Seq_THC 1 68636 

90.E6.sp6:130912.Seq 

RTA00000187AF.h.l3.1 

RTA00000187AF.L1.1 

90.A7.sp6:130865.Seq 

90.B7.sp6:130877.Seq 

RTAOOOOO 187AR.L 10.2 

90.F7.sp6:130925.Seq 

219.F6.sp6:139039.Seq 

RTA00000187AF.1.7.1 

90.G7.sp6:130937.Seq 

RTAOOOOO 187AF.m.3.1 

176.D6.sp6:l 34553. Seq 

90.A8.sp6:130866.Seq 

RTA00000187AF.m.23.2 

RTAOOOOO 187AF.n. 17.1 

90.B8.sp6:130878.Seq 

RTAOOOOO 187AF.0.6.1 

90.C8.sp6:130890.Seq 

90.D8.sp6:130902.Seq 

RTAOOOOO 187AF.0.24.1 

RTA00000187AF.p.l9.1 

90.E8.sp6:130914.Seq 

RTAOOOOO 1 87AF.p. 19.1 .Seq_THC 1 08482 

219.H8.sp6:139065.Seq 

90.F8.sp6:130926.Seq 

RTA00000188AF.b.l2.1 

RTAOOOOO 188AF.d. 11.1 

90.H8.sp6:130950.Seq 

RTAOOOOO 1 88AF.d. 11.1 .Seq_THC2 12094 

RTA000001 88AF.d.21 .1 .Seq_THC208760 

90.A9.sp6:130867.Seq 

RTAOOOOO 188AF.d.2 1.1 

RTAOOOOO 188AF.d.24.1 

90.B9.sp6:130879.Seq 

RTAOOOOO 1 88AF.g.l 1 .1 .Seq_THC 123222 

RTAOOOOO 1 8 8AF.g. 11.1 

90.C9.sp6: 130891. Seq 

RTAOOOOO 188AF.L8.1 

90.D9.sp6:130903.Seq 

RTA00000189AF.C.18.1 

90.A10.sp6:130868.Seq 

90.D10.sp6:130904.Seq 

RTAOOOOO 1 89AFT.7.1 
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cDNA Library ES14 - ATCC# 
Deposit Date - December 22, 1998 



Clone Name 


Cluster ID 


Sequence Name 


M00003851B:D10 


13595 


90.E10.sp6:130916.Seq 


M00003851B:D10 


13595 


RTA00000189AF.f.8.1 


M00003853A:D04 


5619 


90.F10.sp6:130928.Seq 


M00003853A:D04 


5619 


RTA00000189AF.f.l7.1 


M00003853A:F12 


10515 


90.G10.sp6:130940.Seq 


M00003853A:F12 


10515 


RTA00000189AF.f.l8.1 


M00003856B:C02 


4622 


90.H10.sp6:l30952.Seq 


M00003856B:C02 


4622 


RTA00000189AF.g.l.l 


M00003857A:H03 


4718 


90.Bll.sp6:130881.Seq 


M00003857A:H03 


4718 


RTAOOOOO 1 89AF.g.5. 1 .Seq_THC 1 96 1 02 


M00003857A:H03 


4718 


RTA00000189AF.g.5.1 
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cDNA Library ES 15 - ATCC# 
Deposit Date - December 22, 1 998 



Clone Name 


Cluster ID 


Sequence Name 


M00003867A:D10 




90.C1 l.sp6:l 30893. Seq 


M00003867A:D10 




RTA00000189AF.h.l7.1 


M00003871C:E02 


4573 


RTA00000189AF.j.l2.1 


M00003875C:G07 


8479 


90.Gll.sp6:130941.Seq 


M00003875C:G07 


8479 


RTA00000189AF.j.22.1 


M00003875D:D11 




90.Hll.sp6:130953.Seq 


M00003875D:D11 




RTA00000189AF.j.23.1 


M00003876D:E12 


7798 


90A12.sp6:130870.Seq 


M00003876D:E12 


7798 


RTA00000189AF.k.l2.1 


M00003906C:E10 


9285 


90.H12.sp6:130954.Seq 


M00003906C:E10 


9285 


RTA00000190AF.d.7.1 


M00003907D:A09 


39809 


99Al.sp6:131230.Seq 


M00003907D:A09 


39809 


RTA00000 1 90AF.e.3 . 1 .Seq_THC 1 502 1 7 


M00003907DA09 


39809 


RTA00000190AF.e.3.1 


M00003907D:H04 


16317 


99.Bl.sp6:131242.Seq 


M00003907D:H04 


16317 


RTA00000190AF.e.6.1 


M00003909D:C03 


8672 


RTA00000190AF.f.ll.l 


M00003909D:C03 


8672 


99.Cl.sp6:131254.Seq 


M00003968B:F06 


24488 


RTA00000190AF.n.l6.1 


M00003968B:F06 


24488 


99.C2.sp6:131255.Seq 


M00003970C:B09 


40122 


RTA00000190AF.n.23.1 


M00003970C:B09 


40122 


RTA00000190AF.n.23. 1 .Seq_THC 1 09227 


M00003970C:B09 


40122 


99.D2.sp6:131267.Seq 


M00003974D:E07 


23210 


RTA00000190AF.O.20.1 


M00003974D:E07 


23210 


RTA00000 1 90AF.O.20. 1 .Seq_THC207240 


M00003974D:E07 


23210 


99.E2.sp6:131279.Seq 


M00003974D:H02 


23358 


RTAO0000 1 90AF.O.2 1 . 1 .Seq_THC207240 


M00003974D.-H02 


23358 


RTA00000190AF.O.21.1 


M00003974D:H02 


23358 


99.F2.sp6: 13 1291. Seq 


M00003981A:E10 


3430 


99A3.sp6:131232.Seq 


M00003981A:E10 


3430 


RTA00000191AF.a.9.1 


M00003982C:C02 


2433 


RTA00000191AF.a.l5.2 


M00003982C:C02 


2433 


99.B3.sp6:131244.Seq 


M00003982C:C02 


2433 


RTA00000191AF.a.l5.2.Seq_THC79498 


M00004028D:C05 


40073 


RTA00000191AF.e.3.1 


M00004028D:C05 


40073 


99.E3.sp6:131280.Seq 


M00004035C:A07 


37285 


99.H3.sp6:131316.Seq 


M00004035C:A07 


37285 


RTA00000191AF.f.ll.l 


M00004035D:B06 


17036 


RTA00000191AF.f.l3.1 


M00004035D:B06 


17036 


99.A4.sp6:131233.Seq 


M00004072A:C03 




RTA00000191AF.j.9.1 


M00004072A:C03 




99.D4.sp6:131269.Seq 


M00004081C:D10 


15069 


99.F4.sp6:131293.Seq 


M00004081C:D10 


15069 


RTA00000191AF.1.6.1 


M00004086D:G06 


9285 


99.H4.sp6:131317.Seq 


M00004086D:G06 


9285 


RTA00000191AF.m.l8.1 


M00004105C:A04 


7221 


99.D5.sp6:131270.Seq 


M00004105C:A04 


7221 


RTA00000191AF.p.9.1 
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cDNA Library ES 1 5 - ATCC# 
Deposit Date - December 22, 1998 



Clone Name 

M00004171D:B03 

M00004171D:B03 

M00004185C:C03 

M00004185C:C03 

M00004185C:C03 

M00004191D:B11 

M00004191D:B11 

M00004191D:B11 

M00004197D:H01 

M00004197D:H01 

M00004197D:H01 

M00004203B:C12 

M00004203B:C12 

M00004214C:H05 

M00004214C:H05 

M00004223D:E04 

M00004223D:E04 

M00004269D:D06 

M00004269D:D06 

M00004295D:F12 

M00004295D:F12 

M00004296C:H07 

M00004296C:H07 

M00004307C:A06 

M00004307C:A06 

M00004307C:A06 

M00004312A:G03 

M00004312A:G03 

M00004312A:G03 

M00004318C:D10 

M00004318C:D10 

M00004359B:G02 

M00004359B:G02 

M00004505D:F08 

M00004505D:F08 

M00004692A:H08 

M00004692A:H08 

M00004692A:H08 

M00005180C:G03 



Cluster ID 

4908 

4908 

11443 

11443 

11443 



8210 

8210 

8210 

14311 

14311 

11451 

11451 

12971 

12971 

4905 

4905 

16921 

16921 

13046 

13046 

9457 

9457 

9457 

26295 

26295 

26295 

21847 

21847 



Sequence Name 

RTA00000192AFJ.2.1 

99.F6.sp6:131295.Seq 

RTA00000192AF.1.13.2 

123.A8.sp6:132272.Seq 

99.A7.sp6:131236.Seq 

RTA00000192AF.m.l2.1 

99.B7.sp6:131248.Seq 

123.C8.sp6:132296.Seq 

99.C7.sp6:131260.Seq 

123.E8.sp6:132320.Seq 

RTA00000192AF.n.l3.1 

99.D7.sp6:131272.Seq 

RTA00000192AF.O.2.1 

177.D8.sp6:134747.Seq 

RTA00000192AF.p.l7.1 

RTA00000193AF.a.20.1 

99.B8.sp6:131249.Seq 

99.H8.sp6:131321.Seq 

RTA00000193AF.e.l4.1 

99.D9.sp6:131274.Seq 

RTA00000193AF.h.l5.1 

99.E9.sp6:131286.Seq 

RTA00000193AF.H..19.1 

RTA00000193AF.i.l4.2 

99.F9.sp6:131298.Seq 

123.Dll.sp6:132311.Seq 

RTA00000193AF.i.24.2 

99.G9.sp6:131310.Seq 

RTA00000193AF.i.24.2.Seq_THC197345 

RTA00000193AF.j.9.1 

99.H9.sp6:131322.Seq 

RTA000001 93 AF.m.5. 1 .Seq_THC 173318 

RTA00000193AF.m.5.1 

RTA00000194AF.b.l9.1 

99.H10.sp6: 13 1323. Seq 

99.Bll.sp6:131252.Seq 

RTA00000194AF.C.24.1 

377.F4.sp6:141957.Seq 

RTA00000194AF.f.4.1 
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cDNA Library ESI 6 - ATCC# 
Deposit Date - December 22, 1998 



Clone Name 

M00001346D:E03 

M00001350A:B08 

M00001350A:B08 

M00001357D:D11 

M00001357D:D11 

M00001409C:D12 

M00001409C:D12 

M00001418B:F03 

M00001418B:F03 

M00001418B:F03 

M00001418D:B06 

M00001421C:F01 

M00001421C:F01 

M00001429B-.A11 

M00001432C:F06 

M00001439C:F08 

M00001442C:D07 

M00001442C:D07 

M00001443B:F01 

M00001443B:F01 

M00001445A:F05 

M00001445A:F05 

M00001446A:F05 

M00001455A:E09 

M00001455A:E09 

M00001460A:F12 

M00001481D:A05 

M00001490B:C04 

M00001490B:C04 

M00001500C:E04 

M00001500C:E04 

M0O001532B:A06 

M00001532B:A06 

M00001534A-.F09 

M00001534A:F09 

M00001535A:B01 

M00001536A:C08 

M00001536A:C08 

M00001541A:F07 

M00001542B:B01 

M00001542B-.B01 

M00001544A:E03 

M00001545A:C03 

M00001545A:C03 

M00001545A:C03 

M00001548A:H09 

M00001548A:H09 

M00001548A:H09 



Cluster ID 
6806 



4059 
4059 
9577 
9577 
9952 
9952 
9952 
8526 
9577 
9577 
4635 

40054 
16731 
16731 



13532 

13532 

7801 

13238 

13238 

39498 

7985 

18699 

18699 

9443 

9443 

3990 

3990 

5321 

5321 

7665 

39392 

39392 

22085 



12170 

19255 

19255 

19255 

1058 

1058 

1058 



Sequence Name 

RTA00000177AF.g.l3.3 

80.H2.sp6:130293.Seq 

RTAOOOOO 177AF16.2 

RTAO0O00 1 77AF.n. 1 8.3 .SeqJTHC 1 2305 1 

RTA00000177AF.n.l8.3 

RTA00000179AF.O.17.1 

80.E7.sp6:130262.Seq 

RTA00000180AF.C.20.1 

RTA00000 1 80AF.C.20. 1 .Seq_THC 1 62284 

80.E8.sp6:l 30263 .Seq 

RTA00000180AF.d.l.l 

RTA00000180AF.d.23.1 

80.G8.sp6:130287.Seq 

RTA00000180AF.i.20.1 

RTA00000180AF.k.24.1 

RTA00000180AF.p.l0.1 

RTA00000181AF.a.20.1 

80.C10.sp6: 130241. Seq 

80.D10.sp6:130253.Seq 

RTA00000181AF.b.7.1 

8O.E10.sp6:130265.Seq 

RTA00000181AF.C.4.1 

RTA00000181AF.C.21.1 

RTA00000181AF.m.4.1 

RTAOOOOO 1 8 1 AF.m.4. 1 .Seq_THC 1 4069 1 

RTA00000119A.j.20.1 

RTAOOOOO 1 82 AR.j .2.1 

RTAOOOOO 182AF.m. 16.1 

89.D3.sp6:130705.Seq 

89.B4.sp6:130682.Seq 

RTAOOOOO 1 83 AF.c. 1.1 

89.G6.sp6:130744.Seq 

RTAOOOOO 1 83 AF.j. 11.1 

89.B7.sp6:130685.Seq 

RTAOOOOO 1 83 AF.k.8.1 

RTA00000134A.1.19.1 

89.G7.sp6:130745.Seq 

RTAOOOOO 134A.m. 16.1 

RTAOOOOO 135A.e.5. 2 

RTAOOOOO 1 83 AF.p.4.1 

89.F8.sp6:130734.Seq 

RTAOOOOO 125A.h. 18.4 

RTA00000135A.m.l8.1 

184.B10.sp6:135547.Seq 

89.C9.sp6:130699.Seq 

RTAOOOOO 1 26A.e.20.3 .Seq_THC2 1 7534 

RTAOOOOO 126A.e.20.3 

79.F6.sp6: 130081. Seq 



152 



cDNA Library ESI 6 - ATCC# 
Deposit Date - December 22, 1998 



Clone Name 

M00001549A:B02 

M00001549A:B02 

M00001549A:D08 

M00001552B:D04 

M00001552B:D04 

M00001552D:A01 

M00001552D:A01 

M00001553D:D10 

M00001553D:D10 

M00001558A:H05 

M00001558A:H05 

M00001561A:C05 

M00001561A:C05 

M00001564A:B12 

M00001578B:E04 

M00001579D:C03 

M00001579D:C03 

M00001579D:C03 

M00001582D:F05 

M00001587A:B11 

M00001587A:B11 

M00001604A:F05 

M00001604A:F05 

M00001624A:B06 

M00001624A:B06 

M00001624A:B06 

M00001630B:H09 

M00001630B:H09 

M00001630B:H09 

M00001651A:H01 

M00001651A:H01 

M00001677C:E10 

M00001679C:F01 

M00001679C:F01 

M00001679C:F01 

M00001686A:E06 

M00003796C:D05 

M00003796C:D05 

M00003826B:A06 

M00003826B:A06 

M00003833A:E05 

M00003837D:A01 

M00003837D:A01 

M00003846B:D06 

M00003846B:D06 

M00003879B:D10 

M00003879B:D10 

M00003879D:A02 



Cluster ID 

4015 

4015 

10944 

5708 

5708 



22814 
22814 



39486 

39486 

5053 

23001 

6539 

6539 

6539 

39380 

39380 

39391 

39391 

3277 

3277 

3277 

5214 

5214 

5214 



14627 

78091 

78091 

78091 

4622 

5619 

5619 

11350 

11350 

21877 

7899 

7899 

6874 

6874 

31587 

31587 

14507 



Sequence Name 

RTA00000136A.e.l2.1 

79.G6.sp6:130093.Seq 

RTA00000126A.h.l7.2 

RTA00000184AF.g.l2.1 

89.E10.sp6:130724.Seq 

89.F10.sp6:130736.Seq 

RTA00000184AF.g.22.1 

RTA00000184AF.h.l4.1 

89All.sp6:130677.Seq 

RTA00000128A.C.20.1 

89. F12.sp6:130738.Seq 
RTA00000128A.m.22.2 
79.B8.sp6:130035.Seq 
RTA00000184AF.O.12.1 
RTA00000185AF.C.24.1 

90. Gl.sp6:130931.Seq 
173A12.SP6:134080.Seq 
RTA00000185AF.d.ll.l 
RTA00000185AF.d.24.1 
RTA00000129A.e.24.1 
79.E8.sp6:130071.Seq 
RTA00000138A.C.3.1 
79A9.sp6:130024.Seq 
RTA00000138A.1.5.1 
217.El.sp6:139406.Seq 
90.B4.sp6:130874.Seq 
90.D4.sp6:130898.Seq 
122.C2.sp6:132098.Seq 
RTA00000186AF.g.ll.l 
RTA00000186AF.n.7.1 
90A5.sp6:130863.Seq 
RTA00000187AF.g.23.1 
90.C7.sp6:130889.Seq 
RTA00000187AFJ.6.1 
176.G5.sp6:134588.Seq 
RTA00000187AF.m.l5.2 
RTA00000188AF.1.9.1 .Seq_THC 167845 
RTA00000188AF.1.9.1 
RTA00000189AF.a.24.2 
90.F9.sp6:130927.Seq 
RTA00000189AF.b.21.1 

90.H9.sp6: 13095 l.Seq 

RTA00000189AF.C.10.1 

RTA00000189AF.e.9.1 

90.C10.sp6:130892.Seq 

RTA00000189AF.1.20.1 

90.C12.sp6:130894.Seq 

90.D12.sp6:130906.Seq 
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cDNA Library ESI 6 - ATCC# 
Deposit Date - December 22, 1998 


Clone Name 


Cluster ID 


Sequence Name 


M00003879D:A02 


14507 


RTA00000189AR.1.23.2 


M00003891C:H09 




9O.G12.sp6:130942.Seq 


M00003891C:H09 




RTA00000189AF.p.8.1 


M000039l2B:DOl 


12532 


99.Dl.sp6:131266.Seq 


M000039l2B:DOl 


12532 


RTA00000190AF.g.2.1 


M00004072B:B05 


17036 


RTA00000191AF.j.l0.1 


M0000408lC:Dl2 


14391 


RTA00000191AF.1.7.1 


M00004lllD:A08 


6874 


RTA00000192AF.a.l4.1 


M00004lllD:A08 


6874 


99.F5.sp6:131294.Seq 


M00004l2lB:G0l 




177.H4.sp6:134791.Seq 


M00004l2lB:G0l 




99.H5.sp6:131318.Seq 


M00004l2lB:GOl 




RTA00000192AF.C.2.1 


M00004l38B:H02 


13272 


99A6.sp6:131235.Seq 


M00004l38B:H02 


13272 


RTA00000192AF.e.3.1 


M00004l5lD:B08 


16977 


RTA00000192AF.g.3.1 


M00004l69C:Cl2 


5319 


99.E6.sp6:131283.Seq 


M00004l69C:Cl2 


5319 


RTA00000192AF.i.l2.1 


M00004l69C:Cl2 


5319 


123 .F7.sp6:l 3233 l.Seq 


M00004l83C:D07 


16392 


RTA00000192AF.1.1.1 


M00004l83C:D07 


16392 


RTAOOOOO 1 92AF.1. 1 . 1 .Seq_THC202071 


M00004230B:C07 


7212 


RTA00000193AF.b.l4.1 


M00004230B:C07 


7212 


99.D8.sp6:131273.Seq 


M00004249D:FlO 




RTAOOOOO 1 93 AF.c.2 1 . 1 .Seq_THC222602 


M00004249D:FlO 




RTAOOOOO 1 93 AF.c.2 1.1 


M00004275C:Cll 


16914 


99.A9.sp6:131238.Seq 


M00004275C:Cll 


16914 


RTA00000193AF.f.5.1 


M00004283B:A04 


14286 


RTA00000193AF.f.22.1 


M00004285B:E08 


56020 


RTA00000193AF.g.2.1 


M00004327B:H04 




RTA00000193AF.j.20.1 


M00004377C:F05 


2102 


RTAOOOOO 1 93 AF.n.7.1 


M00004384C:D02 




RTAOOOOO 193AF.n. 15.1 


M00004384C:D02 




RTAOOOOO 193 AF.n.l 5.1 .Seq_THC2 1 5687 


M0000446lA:B08 




RTAOOOOO 194AR.a. 10.2 


M0000446lA:B09 




RTAOOOOO 194AF.a. 11.1 


M0000469lD:A05 




RTAOOOOO 194AF.C.23.1 


M00004896A:C07 




RTAOOOOO 194AF.d. 13.1 



The above material has been deposited with the American Type Culture Collection, 
Rockville, Maryland, under the accession number indicated. This deposit will be maintained 
under the terms of the Budapest Treaty on the International Recognition of the Deposit of 
Microorganisms for purposes of Patent Procedure. The deposit will be maintained for a period 
of 30 years following issuance of this patent, or for the enforceable life of the patent, whichever 
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is greater. Upon issuance of the patent, the deposit will be available to the public from the 
ATCC without restriction. 

This deposit is provided merely as convenience to those of skill in the art, and is not an 
admission that a deposit is required under 35 U.S.C. §112. The sequence of the polynucleotides 
contained within the deposited material, as well as the amino acid sequence of the polypeptides 
encoded thereby, are incorporated herein by reference and are controlling in the event of any 
conflict with the written description of sequences herein. A license may be required to make, 
use, or sell the deposited material, and no such license is granted hereby. 

Retrieval of Individual Clones from Deposit of Pooled Clones 

Where the ATCC deposit is composed of a pool of cDNA clones, the deposit was 
prepared by first transfecting each of the clones into separate bacterial cells. The clones were 
then deposited as a pool of equal mixtures in the composite deposit. Particular clones can be 
obtained from the composite deposit using methods well known in the art. For example, a 
bacterial cell containing a particular clone can be identified by isolating single colonies, and 
identifying colonies containing the specific clone through standard colony hybridization 
techniques, using an oligonucleotide probe or probes designed to specifically hybridize to a 
sequence of the clone insert {e.g., a probe based upon unmasked sequence of the encoded 
polynucleotide having the indicated SEQ ID NO). The probe should be designed to have a T m 
of approximately 80°C (assuming 2°C for each A or T and 4°C for each G or C). Positive 
colonies can then be picked, grown in culture, and the recombinant clone isolated. 
Alternatively, probes designed in this manner can be used to PCR to isolate a nucleic acid 
molecule from the pooled clones according to methods well known in the art, e.g., by purifying 
the cDNA from the deposited culture pool, and using the probes in PCR reactions to produce an 
amplified product having the corresponding desired polynucleotide sequence. 
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Table 1. Sequence identification numbers, cluster ID, sequence name, and clone name 



SEQIDNO: Cluster ID 


Sequence Name 


Clone Name 


1 


4635 


RTA00000180AF.L20.1 


M00001429B:A11 


2 




RTA00000185AF.n.l2.1 


M00001608D:A11 


3 


4622 


RTA00000187AF.m.l5.2 


M00001686A:E06 


4 


3706 


RTA00000191AF.L17.2 


M00004068B:A01 


5 


36535 


RTA00000181AF.f.5.1 


M00001449A:G10 


6 


3990 


RTA00000183AF.j.ll.l 


M00001532B:A06 


7 


5319 


RTA00000192AF.L12.1 


M00004169C:C12 


8 


36393 


RTA00000180AF.C.2.1 . 


M00001417A:E02 


9 


2623 


RTA00000183AF.a.6.1 


M00001497A:G02 


10 


7587 


RTA00000178AF.n.24.1 


M00001387B:G03 


11 


7065 


RTA00000137A.g.6.1 


M00001557A:D02 


12 


10539 


RTA00000187AF.1.7.1 


M00001680D:F08 


13 


27250 


RTA00000181AF.g.l0.1 


M00001450A:D08 


14 


5556 


RTA00000179AF.n.l0.1 


M00001407B:D1 1 


15 




RTA00000192AF.m.l2.1 


M00004191D:B1 1 


16 


8761 


RTA00000184AF.k.l2.1 


M00001557D:D09 


17 


4622 


RTA00000189AF.g.l.l 


M00003856B:C02 


18 


11460 


RTA00000187AF.g.l2.1 


M00001676B:F05 


19 


16283 


RTA00000120A.O.20.1 


M00001467A:D08 


20 


3430 


RTA00000191AF.a.9.1 


M00003981A:E10 


21 


7065 


RTA00000184AF.j.21.1 


M00001557A:D02 


22 




RTA00000182AF.1.20.1 


M00001488B:F12 


23 




RTA00000123A.g.l9.1 


M00001531A:H11 


24 


16918 


RTA00000193AF.a.l6.1 


M00004223A:G10 


25 


16914 


RTA00000193AF.f.5.1 


M00004275C:C1 1 


26 


40108 


RTA00000187AF.O.24.1 


M00003741D.-C09 


27 


14286 


RTA00000193AF.f.22.1 


M00004283B:A04 


28 


17004 


RTA00000186AF.b.21.1 


M00001617C:E02 


29 




RTA00000180AF.g.22.1 


M00001426B:D12 


30 


13272 


RTA00000192AF.e.3.1 


M00004138B:H02 


31 




RTA00000194AF.f.4.1 


M00005180C:G03 


32 


32663 


RTA00000118A.1.8.1 


M00001450A:A11 


33 




RTA00000180AF.a.9.1 


M00001414A:B01 


34 


5832 


RTA00000178AF.O.23.1 


M00001388D:G05 


35 


7801 


RTA00000181AF.C.21.1 


M00001446A:F05 


36 


76760 


RTA00000187AF.a.l5.1 


M00001657D:F08 


37 


40132 


RTA00000178AF.C.7.1 


M00001365C:C10 
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SEQIDNO: Cluster ID 


Sequence Name 


Clone Name 


38 




RTA00000183AF.e.l.l 


M00001505C:C05 


39 


4016 


RTA00000118A.C.4.1 


M00001395A:C03 


40 


5382 


RTA00000187AF.m.23.2 


M00001688C:F09 


41 


5693 


RTA00000190AF.p.l7.2 


M00003978B:G05 


42 


307 


RTA00000136A.O.4.2 


M00001552A:B12 


43 


39833 


RTA00000178AF.i.23.1 


M00001378B:B02 


44 




RTA00000193AF.m.5.1 


M00004359B:G02 


45 


5325 


RTA00000191AF.O.6.1 


M00004093D:B12 


46 


5325 


RTA00000191AF.O.6.2 


M00004093D:B12 


47 


18957 


RTA00000190AR.m.9.1 


M00003958A:H02 


48 


39508 


RTA00000120A.O.2.1 


M00001467A:D04 


49 


22390 


RTA00000136A.j.l3.1 


M00001551A:G06 


50 


12170 


RTA00000125A.h.l8.4 


M00001544A:E03 


51 


4393 


RTA00000187AF.n.l7.1 


M00001693C:G01 . 


52 


19 


RTA00000182AF.b.7.1 


M00001463C:B11 


53 




RTA00000193AF.C.21.1 


M00004249D:F10 


54 


7899 


RTA00000189AF.C.10.1 


M00003837D:A01 


55 


40073 


RTA00000191AF.e.3.1 


M00004028D-.C05 


56 


7005 


RTA00000179AF.O.22.1 


M00001410A-.D07 


57 




RTA00000187AF.h.22.1 


M0OOO1679A:F06 


58 


18957 


RTA00000190AF.m.9.2 


M00003958A:H02 


59 


18957 


RTA00000183AF.h.23.1 


M00001528A:F09 


60 


16283 


RTA00000182AF.C.22.1 


M00001467A:D08 


61 


6974 


RTA00000183AF.d.9.1 


M00001504C:H06 


62 


2623 


RTA00000183AF.b.l4.1 


M00001500A:E11 


63 


9105 


RTA00000191AF.a.21.2 


M00003983A:A05 


64 


13238 


RTA00000181AF.m.4.1 


M00001455A:E09 


65 


5749 


RTA00000185AF.a.l9.1 


M00001571C:H06 


66 


6455 


RTA00000193AF.b.9.1 


M00004229B:F08 


67 


23001 


RTA00000185AF.C.24.1 


M00001578B:E04 


68 


6455 


RTA00000192AF.g.23.1 


M00004157C:A09 


69 


13595 


RTA00000189AF.f.8.1 


M00003851B:D10 


70 


39442 


RTA00000120A.O.21.1 


M00001467A:E10 


71 


17036 


RTA00000191AF.f.l3.1 


M00004035D:B06 


72 




RTA00000183AF.g.9.1 


M00001513B:G03 


73 


7005 


RTA00000181AF.k.24.1 


M00001454B:C12 


74 


6268 


RTA00000126A.O.23.1 


M00001551A-B10 


75 


16130 


RTA00000119A.C.13.1 


M00001453A:E1 1 


76 


23201 


RTA00000187AF.a.l4.1 


M00001657D:C03 


77 


5321 


RTA00000183AF.k.8.1 


M00001534A:F09 



157 



SEQIDNO: Cluster ID 


Sequence Name 


Clone Name 


78 


13157 


RTA00000186AF.a.6.1 


M00001614C:F10 


79 


2102 


RTA00000193AF.n.7.1 


M00004377C:F05 


80 


1058 


RTA00000126A.e.20.3 


M00001548A:H09 


81 


40392 


RTA00000180AFJ.8.1 


M00001429D:D07 


82 




RTA00000183AF.e.23.1 


M00001506D:A09 


83 


11476 


RTA00000187AF.p.l9.1 


M00003747D:C05 


84 


3584 


RTA00000177AF.h.20.1 


M00001349B:B08 


85 


10470 


RTA00000180AF.f.l8.1 


M00001424B:G09 


86 


39425 


RTA00000133A.f.l.l 


M00001470A:C04 


87 


5175 


RTA00000184AF.f.3.1 


M00001550A:G01 


88 


13576 


RTA00000189AF.O.13.1 


M00003885C:A02 


89 


7665 


RTA00000134A.il 9.1 


M00001535A:B01 


90 


16927 


RTA00000177AF.L9.3 


M00001348B:B04 


91 


6660 


RTA00000187AF.L15.1 


M00001679A-.A06 


92 


2433 


RTA00000191AF.a.l5.2 


M00003982C:C02 


93 


5097 


RTA00000134A.k.l.l 


M00001534A:D09 


94 


21847 


RTA00000193AF.j.9.1 


M00004318C:D10 


95 


3277 


RTA00000138A.1.5.1 


M00001624A:B06 


96 


5708 


RTA00000184AF.g.l2.1 


M00001552B:D04 


97 


945 


RTA00000178AR.a.20.1 


M00001362C:H11 


98 


16269 


RTA00000178AF.p.l.l 


M00001389A:C08 


99 




RTA00000183AF.C.24.1 


M00001504A:E01 


100 


16731 


RTA00000181AF.a.20.1 


M00001442C:D07 


101 


12439 


RTA00000190AF.O.24.1 


M00003975A:G11 


102 


3162 


RTA00000177AF.j.l2.3 


M00001351B:A08 


103 




RTA00000194AF.b.l9.1 


M00004505D:F08 


104 




RTA00000193AF.n.l5.1 


M00004384C:D02 


105 




RTA00000186AF.n.7.1 


M00001651A:H01 


106 


10717 


RTA00000181AF.d.l0.1 


M00001447A:G03 


107 


4573 


RTA00000189AF.j.l2.1 


M00003871C:E02 


108 




RTA00000186AF.h.l4.1 


M00001632D:H07 


109 


11443 


RTA00000192AF.1.13.2 


M00004185C:C03 


110 


5892 


RTA00000184AF.d.ll.l 


M00001548A:E10 


111 


3162 


RTA00000177AF.j.l2.1 


M00001351B:A08 


112 


10470 


RTA00000185AF.k.6.1 


M00001597D:C05 


113 


17055 


RTA00000187AF.m.3.1 


M00001682C:B12 


114 


2030 


RTA00000193AF m 20 1 

A V A J. M. V* \J V Vr V A *J A. «• Jk till * Mtd \J « X 


M00004372A-A03 


115 


6558 


RTA00000184AF.m.21.1 


M00001560D:F10 


116 


23255 


RTA00000190AFJ.4.1 


M00003922A:E06 


117 


9577 


RTA00000179AF.O.17.1 


M00001409C:D12 



158 



SEQIDNO: Cluster ID 


Sequence Name 


Clone Name 


118 




RTA00000180AF.a.ll.l 


M00001414C:A07 


119 


8 


RTA00000181AF.e.l7.1 


M00001448D:C09 


120 


67907 


RTA00000188AF.g.ll.l 


M00003774C:A03 


121 


12081 


RTA00000133A.d.l4.2 


M00001469A:C10 


122 


2448 


RTA00000119A.j.21.1 


M00001460A:F06 


123 


3389 


RTA00000189AF.g.3.1 


M00003857A:G10 


124 


39174 


RTA00000124A.n.l3.1 


M00001541A:H03 


125 


24488 


RTA00000190AF.n.l6.1 


M00003968B:F06 


126 


8210 


RTA00000192AF.n.l3.1 


M00004197D:H01 


127 




RTA00000135A.1.2.2 


M00001545A:B02 


128 


40455 


RTA00000190AF.m.l0.2 


M00003958C:G10 


129 


9577 


RTA00000180AF.d.23.1 


M00001421C:F01 


130 


13183 


RTA00000192AF.a.24.1 


M00004114C:F11 


131 


5214 


RTA00000186AF.g.ll.l 


M00001630B:H09 


132 


67252 


RTA00000187AF.O.6.1 


M00001716D:H05 


133 


3108 


RTA00000188AF.d.24.1 


M00003763A:F06 


134 


2464 


RTA00000178AF.n.l8.1 


M00001387A:C05 


135 


36313 


RTA00000181AF.e.23.1 


M00001448D:H01 


136 


23255 


RTA00000177AF.e.l4.3 


M00001343D:H07 


137 


7985 


RTA00000182AR.j.2.1 


M00001481D:A05 


138 


8286 


RTA00000183AF.O.1.1 


M00001540A:D06 


139 


22195 


RTA00000180AF.g.7.1 


M00001425B:H08 


140 


4573 


RTA00000184AF.h.9.1 


M00001553B:F12 


141 


26875 


RTA00000187AF.i.l.l 


M00001679A:F10 


142 


7187 


RTA00000177AF.i.8.2 


M00001350A:H01 


143 


86859 


RTA00000118A.p.8.1 


M00001452A:B12 


144 


4623 


RTA00000185AF.f.4.1 


M00001586C:C05 


145 




RTA00000121A.C.10.1 


M00001469A:A01 


146 


10185 


RTA00000183AF.d.5.1 


M00001504C:A07 


147 




RTA00000183AF.p.4.1 


M00001542B:B01 


148 


15069 


RTA00000191AF.1.6.1 


M00004081C:D10 


149 


39304 


RTA00000118A.j.21.1 


M00001450A:A02 


150 


8672 


RTA00000190AF.f.ll.l 


M00003909D:C03 


151 


13576 


RTA00000177AF.g.l6.1 


M00001347A:B10 


152 


6293 


RTA00000185AF.e.ll.l 


M00001583D:A10 


153 


16977 


RTA00000192AF.g.3.1 


M00004151D:B08 


154 


5345 


RTA00000189AF 1 19 1 


M00003879R-C1 1 


155 


4905 


RTA00000193AF.e.l4.1 


M00004269D:D06 


156 


17036 


RTA00000191AF.j.l0.1 


M00004072B:B05 


157 


5417 


RTA00000191AF.h.l9.1 


M00004059A:D06 



159 



SEQIDNO: Cluster ID 


Sequence Name 


Clone Name 


158 


7172 


RTA00000178AF.f.9.1 


M00001371C:E09 


159 


40044 


RTA00000186AF.d.l.l 


M00001621C:C08 


160 


4386 


RTA00000184AF.j.4.1 


M00001556B:C08 


161 


40044 


RTA00000183AF.g.22.1 


M00001514C:D11 


162 


9685 


RTA00000183AF.C.11.1 


M00001501D:C02 


163 


22155 


RTA00000185AF.n.9.1 


M00001608B:E03 


164 


10515 


RTA00000189AF.f.l8.1 


M00003853A:F12 


165 


6539 


RTA00000185AF.d.ll.l 


M00001579D:C03 


166 


15066 


RTA00000180AF.e.24.1 


M00001423B:E07 


167 


4261 


RTA00000180AF.h.5.1 


M00001426D:C08 


168 


13864 


RTA00000125A.m.9.1 


M00001545A:D08 


169 


6539 


RTA00000189AF.d.22.1 


M00003844C:B11 


170 


11465 


RTA00000185AF.m.l9.1 


M00001607A:E11 


171 


3266 


RTA00000184AR.g.l.l 


M00001551C:G09 


172 


102 


RTA00000184AF.O.5.1 


M00001563B:F06 


173 


16970 


RTA00000181AR.1.18.2 


M00001452C:B06 


174 


12971 


RTA00000193AF.a.20.1 


M00004223D:E04 


175 


5007 


RTA00000177AF.B.2.1 


M00001346A:F09 


176 


3765 


RTA00000135A.d.l.l 


M00001541A:D02 


177 


11294 


RTA00000184AF.J.6.1 


M00001556B:G02 


178 


3681 


RTA00000131A.g.l5.2 


M00001449A:D12 


179 


9283 


RTA00000181AR.m.21.2 


M00001455D:F09 


180 


18699 


RTA00000182AF.m.l6.1 


M00001490B:C04 


181 


86110 


RTA00000181AF.f.l2.1 


M00001449C:D06 


182 


39648 


RTA00000178AR.1.8.2 


M00001383A:C03 


183 


7337 


RTA00000123A.b.l7.1 


M00001528A:C04 


184 


1334 


RTA00000178AF.j.7.1 


M00001379A:A05 


185 


17076 


RTA00000188AF.d.21.1 


M00003762C:B08 


186 


22794 


RTA00000138A.b.5.1 


M00001601A:D08 


187 


39171 


RTA00000186AF.1.7.1 


M00001644C:B07 


188 


8551 


RTA00000179AF.p.21.1 


M00001412B:B10 


189 


5857 


RTA00000118A.g.l4.1 


M00001449A:A12 


190 


9443 


RTA00000183AF.C.1.1 


M00001500C:E04 


191 


9457 


RTA00000193AF.1.14.2 


M00004307C:A06 


192 


7206 


RTA00000182AF.O.15.1 


M00001494D:F06 


193 


22979 


RTA00000178AF.k.22.1 


M00001382C:A02 


194 


40455 


RTA00000190AR.m.lO 1 


M00003958CG10 


195 


7221 


RTA00000191AF.p.9.1 


M00004105C:A04 


196 




RTA00000191AFJ.9.1 


M00004072A:C03 


197 


7239 


RTA00000126A.m.4.2 


M00001550A:A03 



160 



SEQ ID NO: Cluster ID 


Sequence Name 


Clone Name 


198 


31587 


RTA00000189AF.1.20.1 


M00003879B:D10 


199 


16317 


RTA00000190AF.e.6.1 


M00003907D:H04 


200 


13576 


RTA00000189AR.O.13.1 


M00003885C:A02 


201 


5779 


RTA00000177AF.g.l4.3 


M00001346D:G06 


202 


6124 


RTA00000191AR.e.2.3 


M00004028D:A06 


203 


9952 


RTA00000180AF.C.20.1 


M00001418B:F03 


204 




RTA00000188AF.i.8.1 


M00003784D:D12 


205 


5779 


RTA00000177AF.g.l4.1 


M00001346D:G06 


206 


39490 


RTA00000128A.b.4.1 


M00001557A:F03 


207 


4416 


RTA00000187AF.h.l3.1 


M00001678D:F12 


208 


4009 


RTA00000179AF.e.20.1 


M00001396A:C03 


209 


5336 


RTA00000183AF.b.l3.1 


M00001500A:C05 


210 


39186 


RTA00O00121A.D.15.1 


M00001512A:A09 


211 


40122 


RTA00000190AF.n.23.1 


M00003970C:B09 


212 


12532 


RTA00000190AF.g.2.1 


M00003912B:D01 


213 


8078 


RTA00000177AR.1.13.1 


M00001353A:G12 


214 


3900 


RTA00O00190AF.g.l3.1 


M00003914C:F05 


215 


7589 


RTA00000120A.p.23.1 


M00001468A:F05 


216 


8298 


RTA00000127A.d.l9.1 


M00001553A:H06 


217 


4443 


RTA00000177AF.b.20.4 


M00001341A:E12 


218 


26295 


RTA00000193AFJ.24.2 


M00004312A:G03 


219 


3389 


RTA00000183AF.m.l9.1 


M00001537B:G07 


220 


7015 


RTA00000187AF.f.l8.1 


M00001673C:H02 


221 


8526 


RTA00000180AF.d.l.l 


M00001418D:B06 


222 


4665 


RTA00000186AF.m.3.1 


M00001648C:A01 


223 


1399 


RTA00000129A.O.10.1 


M00001604A:B10 


224 


9244 


RTA00000127A.1.3.1 


M00001556A:C09 


225 




RTA00000179AF.J.13.1 


M00001400B:H06 


226 


82498 


RTA00000118A.m.l0.1 


M00001450A:B12 


227 


35702 


RTA00000187AR.C.15.2 


M00001663A-.E04 


228 


38759 


RTA00000120A.m.l2.3 


M00001467A:B07 


229 


39648 


RTA00000178AF.1.8.1 


M00001383A:C03 


230 


19105 


RTA00000133A.e.l5.1 


M00001469A:H12 


231 


85064 


RTA00000131A.m.23.1 


M00001452A:F05 


232 


9285 


RTA00000191AF.m.l8.1 


M00004086D:G06 


233 


9285 


RTA00000190AF.d.7.1 


M00003906C:E10 


234 


39391 


RTA0OO00138A c3 1 


M00001604A-F05 

lVlV/V/ \J\f 1 UvlA.l \J J 


235 




RTA00000178AF.d.20.1 


M00001368D:E03 


236 


39498 


RTA00000119A.j.20.1 


M00001460A:F12 


237 


7798 


RTA00000189AF.k.l2.1 


M00003876D:E12 



161 



SEQ ID NO: Cluster ID 


Sequence Name 


Clone Name 


238 


7798 


RTA00000189AF.C.18.1 


M00003839A:D08 


239 


19829 


RTA00000125A.h.24.4 


M00001544A:G02 


240 




RTA00000188AF.d.ll.l 


M00003761D:A09 


241 


4275 


RTA00000120A.J.14.1 


M00001466A:E07 


242 


22113 


RTA00000125A.C.7.1 


M00001542A:A09 


243 


40314 


RTA00000186AF.C.15.1 


M00001619C:F12 


244 


10944 


RTA00000126A.h.l7.2 


M00001549A:D08 


245 


39809 


RTA00000190AF.e.3.1 


M00003907D-.A09 


246 


22085 


RTA00000135A.e.5.2 


M00001541A:F07 


247 


19255 


RTA00000135A.m.l8.1 


M00001545A.-C03 


248 


14311 


RTA00000192AF.O.2.1 - 


M00004203B:C12 


249 


8479 


RTA00000189AF.j.22.1 

ml 


M00003875C:G07 


H : 250 




RTA00000189AF.j.23.1 


M00003875D:D11 


2 251 


4193 


RTA00000184AF.e.l3.1 


M00001549B:F06 


V 252 


22814 


RTA00000184AF.h.l4.1 


M00001553D:D10 


Cli ■ 253 


39563 


RTA00000179AF.k.20.1 


M00001402A:E08 


ill 254 


39420 


RTA00000134A.O.23.1 


M00001537A:F12 


s s-% 

£ 255 


11589 


RTA00000177AF.b.l7.4 


M00001340D:F10 


1,1 256 


4937 


RTA00000191AF.p.21.1 


M00004108A:E06 


H- 257 


39412 


RTA00000133A.L17.1 


M00001511A:H06 


111 258 


4837 


RTA00000185AR.k.3.2 


M00001597C:H02 


fsb 259 


13046 


RTA00000193AF.h.l9.1 


M00004296C:H07 


SI' 260 


4141 


RTA00000177AF.p.20.3 


M00001361A:A05 


ni 261 


38085 


RTA00000123A.e.l5.1 


M00001531A:D01 


262 




RTA00000189AF.p.8.1 


M00003891C:H09 


263 


11451 


RTA00000192AF.D.17.1 


M00004214C:H05 


264 


14507 


RTA00000189AR.1.23.2 


M00003879D:A02 


265 


40054 


RTA00000180AF.p.l0.1 


M00001439C:F08 


266 


39423 


RTA00000134A.L22.1 


M00001535A:F10 


267 


39453 


RTA00000135A.2.11.1 


M00001542A:E06 


268 


10751 


RTA00000187AF.k.7.1 


M00001679D:D03 


269 


10751 


RTA00000187AF.k.6.1 


M00001679D:D03 


270 


78091 


RTA00000187AF.j.6.1 


M00001679C:F01 


271 


39539 


RTA00000127A.i.21.1 


M00001555A:B02 


272 




RTA00000182AF.1.15.1 


M00001487B:H06 


273 




RTA00000194AF.d.l3.1 


M00004896A:C07 


274 




RTA00000128A.C.20.1 


M00001558A:H05 


275 


9283 


RTA00000181AR.m.22.2 


M00001455D:F09 


276 


39168 


RTA00000121A.1.10.1 


M00001507A:H05 


277 


39458 


RTA00000126A.p.l5.2 


M00001552A:D11 



162 



Cluster ID 


Sequence Name 


Clone Name 


14391 


RTA00000177AF.m.l7.3 


M00001355B:G10 


39195 


RTA00000137A.C.16.1 


M00001555A:C01 


7212 


RTA00000193AF.b.l4.1 


M00004230B:C07 


4015 


RTA00000136A.e.l2.1 


M00001549A:B02 


12977 


RTA00000189AFJ.19.1 


M00003875B:F04 




RTA00000178AF.m.l3.1 


M00001384B:A11 


14391 


RTA00000191AF.1.7.1 


M00004081C:D12 




RTA00000194AF.C.23.1 


M00004691D:A05 




RTA00000181AF.b.7.1 


M00001443B:F01 


8358 


RTA00000183AF.L5.1 


M00001528B:H04 


1267 


RTA00000125A.O.5.1 


M00001546A:G11 




RTA00000189AF.f.7.1 


M00003851B:D08 


16347 


RTA00000184AF.e.l5.1 


M00001549C:E06 


7899 


RTA00000193AF.a.l7.1 


M00004223B:D09 


2379 


RTA00000178AF.a.6.1 


M00001361D:F08 


39478 


RTA00000133A.i.5.1 


M00001471A:B01 


39392 


RTA00000134A.m.l6.1 


M00001536A:C08 


5053 


RTA00000184AF.O.12.1 


M00001564A:B12 


16999 


RTA00000185AF.k.9.1 


M00001598A:G03 


39180 


RTA00000126A.n.8.2 


M00001551A:F05 


1037 


RTA00000121A.f.8.1 


M00001470A:B10 


6867 


RTA00000178AF.e.l2.1 


M00001370A:C09 


10539 


RTA000001 83AF.a.24.1 


M00001499B:A11 


41633 


RTA00000118A.S.16.1 


M00001449A:B12 


23218 


RTA00000187AR.C.5.2 


M00001662C:A09 


39380 


RTA00000129A.e.24.1 


M00001587A:B11 




RTA00000185AF.d.24.1 


M00001582D:F05 




RTA00000177AF.O.4.3 


M00001358C:C06 


6974 


RTA00000184AF.a.l5.1 


M00001544B:B07 




RTA00000185AF.g.ll.l 


M00001590B:F03 


15855 


RTA00000184AFJ.1.1 


M00001556A:H01 


84328 


RTA00000118A.D.10.1 

* 


M00001452A:B04 


10145 


RTA00000120A.g.l2.1 


M00001465A:B11 


39805 


RTA00000177AF.C.21.3 


M00001342B:E06 




RTA00000187AF.h.23.1 


M00001679A:F06 


6298 


RTA00000187AR.i.l0.2 


M00001679B:F01 


14367 


RTA00000187AFe 8 1 


M00001670C-H02 




RTA00000193AF.C.22.1 


M00004249D:G12 


16921 


RTA00000183AF.L6.1 


M00001534A:C04 


1577 


RTA00000184AF.L23.1 


M00001556A:F11 
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SEQIDNO: Cluster ID 


Sequence Name 


Clone Name 


318 


8773 


RTA00000187AF.f.24.1 


M00001675A:C09 


319 




RTA00000194AF.a.ll.l 


M00004461A:B09 


320 


39886 


RTA00000178AF.J.24.1 


M00001380D:B09 


321 


13532 


RTA00000181AF.C.4.1 


M00001445A:F05 


322 




RTA00000193AF.d.2.1 


M00004251C:G07 


323 


5257 


RTA00000192AF.f.3.1 


M00004146C:C11 


324 


9061 


RTA00000191AR.e.ll.2 


M00004031A.-A12 


325 


19267 


RTA00000186AF.1.12.1 


M00001645A:C12 


326 


20212 


RTA00000134A.1.22.1 


M00001535A:C06 


327 


16653 


RTA00000181AF.k.5.3 


M00001453C:F06 


328 


16985 


RTA00000177AF.h.l0.1 


M00001348B:G06 


329 


12977 


RTA00000189AR.U9.1 


M00003875B:F04 


330 


9061 


RTAOO000 191 AR.e. 11.3 


M00004031A:A12 


331 




RTA00000194AR.a.l0.2 


M00004461A:B08 


332 


6468 


RTA00000187AF.d.l5.1 


M00001669B:F02 


333 


16392 


RTA00000192AF.1.1.1 


M00004183C:D07 


334 


14627 


RTA00000187AF.2.23.1 


M00001677C:E10 


335 


6583 


RTA00000179AF.d.l3.1 


M00001394A:F01 


336 


6806 


RTA00000177AF.g.l3.3 


M00001346D:E03 


337 


9635 


RTA00000137A.e.23.4 


M00001557A:F01 


338 


689 


RTA00000181AR.1.22.1 


M00001454D:G03 


339 


4119 


RTA00000183AF.k.l6.1 


M00001534C:A01 


340 


8952 


RTA00000183AF.h.l5.1 


M00001518C:B11 


341 


2379 


RTA00000192AF.p.8.1 


M00004212B:C07 


342 


39486 


RTA00000128A.m.22.2 


M00001561A:C05 


343 


21877 


RTA00000189AF.b.21.1 


M00003833A:E05 


344 


6874 


RTA00000192AF.a.l4.1 


M00004111D:A08 


345 


6874 


RTA00000189AF.e.9.1 


M00003846B:D06 


346 


37285 


RTA00000191AF.f.ll.l 


M00004035C:A07 


347 




RTA00000193AF.j.20.1 


M00004327B:H04 


348 


7674 


RTA00000118A.g.9.1 


M00001416A:H01 


349 


2797 


RTA00000180AF.i.l9.1 


M00001429A:H04 


350 




RTA00000 1 84AF.g.22. 1 


M00001552D:A01 


351 


7802 


RTA00000185AF.n.5.1 


M00001608A:B03 


352 


16921 


RTA00000193AF.h.l5.1 


M00004295D:F12 


353 


11494 


RTA00000192AF.j.6.1 


M00004172C:D08 


354 


17062 


RTA00000177AF.b.8.4 


M00001340B:A06 


355 


16245 


RTA00000177AF.k.9.3 


M00001352A:E02 


356 


83103 


RTA00000119A.e.24.2 


M00001454A:A09 


357 


4309 


RTA00000186AF.e.22.1 


M00001624C:F01 
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SEQIDNO: Cluster ID 


Sequence Name 


Clone Name 


358 


13072 


RTA00000181AR.m.5.2 


M00001455B:E12 


359 


4059 


RTA00000177AF.n.l8.3 


M00001357D:D11 


360 


5178 


RTA00000178AF.n.l0.1 


M00001386C:B12 


361 


1120 


RTA00000118A.p.l5.3 


M00001452A:D08 


362 


6420 


RTA00000183AF.d.ll.l 


M00001504D:G06 


363 


13913 


RTA00000186AF.e.6.1 


M00001623D:F10 


364 




RTA00000192AF.C.2.1 


M00004121B:G01 


365 


3956 


RTA00000183AF.g.3.1 


M00001512D:G09 


366 


14364 


RTA00000183AF.g.l2.1 


M00001513C:E08 


367 


6880 


RTA00000191AF.m.20.1 


M00004087D:A01 


368 


84182 


RTA00000180AF.h.l9.1 


M00001428A:H10 


369 


2790 


RTA00000177AF.e.2.1 


M00001343C:F10 


370 


4561 


RTA00000.184AF.i.21.1 


M00001555D:G10 


371 


8847 


RTA00000180AF.b.l6.1 


M00001416B:H11 


372 


56020 


RTA00000193AF.g.2.1 


M00004285B:E08 


373 


1531 


RTA00000119A.O.3.1 


M00001461A:D06 


374 


6420 


RTA00000177AF.f.l0.3 


M00001345A:E01 


375 




RTA00000188AF.b.l2.1 


M00003754C:E09 


376 




RTA00000180AF.k.24.1 


M00001432C.-F06 


377 




RTA00000184AF.a.8.1 


M00001544A:E06 


378 


2696 


RTA00000134A.m.l3.1 


M00001536A:B07 


379 


260 


RTA00000185AR.i.l2.2 


M00001594B:H04 


380 


11350 


RTA00000189AF.a.24.2 


M00003826B:A06 


381 


2428 


RTA00000123A.1.21.1 


M00001533A:C11 


382 


4313 


RTA00000122A.n.3.1 


M00001517A:B07 


383 




RTA00000184AF.p.3.1 


M00001566B:D11 


384 


697 


RTA00000188AF.d.6.1 


M00003759B:B09 


385 


5619 


RTA00000188AF.1.9.1 


M00003796C:D05 


386 


4568 


RTA00000122A.d.l5.3 


M00001513A:B06 


387 




RTA00000177AF.i.6.2 


M00001350A:B08 


388 


5622 


RTA00000178AF.a.ll.l 


M00001362B:D10 


389 


7514 


RTA00000184AF.k.21.1 


M00001558B:H11 


390 


5619 


RTA00000189AF.f.l7.1 


M00003853A:D04 


391 


7570 


RTA00000187AF.g.24.1 


M00001677D:A07 


392 


23358 


RTA00000190AF.O.21.1 


M00003974D:H02 


393 


23210 


RTA00000190AF.O.20.1 


M00003974D:E07 


394 


5192 


RTA00000184AF.k.2.1 


M00001557B:H10 


395 


13538 


RTA00000180AF.a.24.1 


M00001415A:H06 


396 




RTA00000189AF.h.l7.1 


M00003867A:D10 


397 




RTA00000192AF.O.11.1 


M00004205D:F06 
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SEQIDNO: Cluster ID 


Sequence Name 


Clone Name 


398 




RTA00000184AF.1.11.1 


M00001559B:F01 


399 


4718 


RTA00000189AF.g.5.1 


M00003857A:H03 


400 


14929 


RTA00000177AF.m.l.2 


M00001353D:D10 


401 


4908 


RTA00000192AF.j.2.1 


M00004171D:B03 


402 




RTA00000178AF.k.l6.1 


M00001381D:E06 


403 




RTA00000194AF.C.24.1 


M00004692A:H08 


404 


17732 


RTA00000178AR.i.2.2 


M00001376B:G06 


405 


17062 


80.Al.sp6:130208.Seq 


M00001340B:A06 


406 


11589 


80.Bl.sp6:130220.Seq 


M00001340D:F10 


407 


4443 


80.Cl.sp6:130232.Seq 


M00001341A:E12 


408 


39805 


80.Dl.sp6:130244.Seq 


M00001342B:E06 


409 


2790 


80.El.sp6:130256.Seq 


M00001343C:F10 


410 


23255 


80.Fl.sp6:130268.Seq 


M00001343D:H07 


411 


6420 


80.Gl.sp6:130280.Seq 


M00001345A:E01 


412 


5007 


80.Hl.sp6:130292.Seq 


M00001346A:F09 


413 


13576 


80.D2.sp6:130245.Seq 


M00001347A:B10 


414 


16927 


80.E2.sp6:130257.Seq 


M00001348B:B04 


415 


16985 


80.F2.sp6:130269.Seq 


M00001348B:G06 


416 


3584 


80.G2.sp6:130281.Seq 


M00001349B:B08 


417 




80.H2.sp6:130293.Seq 


M00001350A:B08 


418 


7187 


80.A3.sp6:130210.Seq 


M00001350A:H01 


419 


16245 


80.D3.sp6:130246.Seq 


M00001352A:E02 


420 


8078 


80.E3.sp6:130258.Seq 


M00001353A:G12 


421 


14929 


80.F3.sp6:130270.Seq 


M00001353D:D10 


422 


14391 


80.G3.sp6:130282.Seq 


M00001355B:G10 


423 


4141 


80.B4.sp6:130223.Seq 


M0O001361A:A05 


424 


2379 


80.C4.sp6:130235.Seq 


M00001361D:F08 


425 


5622 


80.D4.sp6:130247.Seq 


M00001362B:D10 


426 


945 


80.E4.sp6:130259.Seq 


M00001362C.-H11 


427 


40132 


80.F4.sp6:130271.Seq 


M00001365C:C10 


428 




80.G4.sp6:130283.Seq 


M00001368D:E03 


429 


6867 


80.H4.sp6:130295.Seq 


M00001370A:C09 


430 


7172 


80A5.sp6:130212.Seq 


M00001371CE09 


431 


17732 


8O.B5.sp6:130224.Seq 


M00001376B:G06 


432 


39833 


8O.C5.sp6:130236.Seq 


M00001378B:B02 


433 


1334 


80.D5.sp6:130248.Seq 


M00001379A:A05 


434 


39886 


80.E5.sp6:130260.Seq 


M00001380D:B09 


435 




80.F5.sp6:130272.Seq 


M00001381D:E06 


436 


22979 


80.G5.sp6:130284.Seq 


M00001382CA02 


437 


39648 


80.H5.sp6:130296.Seq 


M00001383A:C03 
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SEQ ID NO: Cluster ID 


Sequence Name 


Clone Name 


438 




80.B6.sp6:130225.Seq 


M00001384B:A1 1 


439 


5178 


80.C6.sp6:130237.Seq 


M00001386C:B12 


440 


2464 


80.D6.sp6:130249.Seq 


M00001387A:C05 


441 


7587 


80.E6.sp6:l 30261. Seq 


M00001387B:G03 


442 


5832 


80.F6.sp6:130273.Seq 


M00001388D.-G05 


443 


16269 


80.G6.sp6:130285.Seq 


M00001389A:C08 


444 


6583 


80.H6.sp6:130297.Seq 


M00001394A:F01 


445 


4009 


80.A7.sp6:130214.Seq 


M00001396A:C03 


446 




80.B7.sp6:130226.Seq 


M00001400B:H06 


447 


39563 


80.C7.sp6:130238.Seq 


M00001402A.-E08 


448 


5556 


80.D7.sp6:130250.Seq 


M00001407B:D1 1 


449 


9577 


80.E7.sp6:130262.Seq 


M00001409C:D12 


450 


7005 


80.F7.sp6:130274.Seq 


M00001410A:D07 


451 


8551 


80.G7.sp6:130286.Seq 


M00001412B:B10 


452 




80.H7.sp6:130298.Seq 


M00001414A:B01 


453 




80.A8.sp6:130215.Seq 


M00001414C:A07 


454 


13538 


80.B8.sp6:130227.Seq 


M00001415A:H06 


455 


8847 


80.C8.sp6:130239.Seq 


M00001416B:H11 


456 


36393 


80.D8.sp6:130251.Seq 


M00001417A:E02 


457 


9952 


80.E8.sp6:130263.Seq 


M00001418B:F03 


458 


9577 


80.G8.sp6:130287.Seq 


M00001421C:F01 


459 


15066 


80.H8.sp6:130299.Seq 


M00001423B:E07 


460 


10470 


80.A9.sp6:130216.Seq 


M00001424B:G09 


461 


22195 


80.B9.sp6:130228.Seq 


M00001425B:H08 


462 




80.C9.sp6:130240.Seq 


M00001426B:D12 


463 


4261 


80.D9.sp6:130252.Seq 


M00001426D:C08 


464 


84182 


80.E9.sp6:130264.Seq 


M00001428A:H10 


465 


40392 


80.H9.sp6:130300.Seq 


M00001429D:D07 


466 


16731 


80.C10.sp6:130241.Seq 


M00001442C:D07 


467 




80.D10.sp6: 130253. Seq 


M00001443B:F01 


468 


13532 


80.E10.sp6:130265.Seq 


M00001445A:F05 


469 


8 


80.H10.sp6:130301.Seq 


M00001448D:C09 


470 


36313 


80.All.sp6:130218.Seq 


M00001448D:H01 


471 


5857 


80.Bll.sp6:130230.Seq 


M00001449A:A12 


472 


41633 


80.Cll.sp6:130242.Seq 


M00001449A:B12 


473 


36535 


80.Dll.sp6:130254.Seq 


M00001449A:G10 


474 


86110 


80.Ell.sp6:130266.Seq 


M00001449C:D06 


475 


32663 


80.Fll.sp6:130278.Seq 


M00001450A:A11 


476 


27250 


80.Gll.sp6:130290.Seq 


M00001450A:D08 


477 


16970 


80.Hll.sp6:130302.Seq 


M00001452C:B06 
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SEQIDNO: Cluster ID 


Sequence Name 


Clone Name 


478 


16130 


80.A12.sp6:130219.Seq 


M00001453A:E11 


479 


16653 


80.B 12.sp6: 13023 l.Seq 


M00001453C:F06 


480 


7005 


80.C12.sp6:130243.Seq 


M00001454B:C12 


481 


13072 


80.F12.sp6:130279.Seq 


M00001455B:E12 


482 


9283 


80.G12.sp6:130291.Seq 


M00001455D:F09 


483 


23255 


100.Cl.sp6:131446.Seq 


M00001343D:H07 


484 


13576 


100.El.sp6:131470.Seq 


M00001347A:B10 


485 


7187 


100.C2.sp6:131447.Seq 


M00001350A-.H01 


486 


14391 


100.E3.sp6:131472.Seq 


M00001355B:G10 


487 


945 


100.E4.sp6:131473.Seq 


M00001362C:H11 


488 


7172 


100.A5.sp6:131426.Seq 


M00001371C:E09 


489 


39648 


100.A6.sp6:131427.Seq 


M00001383A:C03 


490 


84182 


100.G9.sp6:131502.Seq 


M00001428A:H10 


491 


8 


100.Bll.sp6:131444.Seq 


M00001448D-.C09 


492 


36535 


100.Dll.sp6:131468.Seq 


M00001449A:G10 


493 


82498 


100.Fll.sp6:131492.Seq 


M00001450A:B12 


494 


16970 


l00.Cl2.sp6:l31457.Seq 


M00001452C:B06 


495 


16130 


100.D12.sp6: 13 1469.Seq 


M00001453A:E11 


496 


7005 


121.Dl.sp6:131917.Seq 


M00001454B:C12 


497 




121.G6.sp6:131958.Seq 


M00001506D:A09 


498 


18957 


121.F7.sp6:131947.Seq 


M00001528A:F09 


499 


40044 


122.El.sp6: 132 12 l.Seq 


M00001621C:C08 


500 


5214 


122.C2.sp6:132098.Seq 


M00001630B:H09 


501 


6660 


122.B5.sp6:132089.Seq 


M00001679A:A06 


502 


13183 


123.D5.sp6:132305.Seq 


M00004114C:F11 


503 


6455 


123.E7.sp6:132319.Seq 


M00004157C:A09 


504 


5319 


123 JF7.sp6:13233 l.Seq 


M00004169C:C12 


505 


11443 


123.A8.sp6:132272.Seq 


M00004185C:C03 


506 




123.C8.sp6:132296.Seq 


M00004191D:B11 


507 


8210 


123.E8.sp6:132320.Seq 


M00004197D:H01 


508 


9457 


123.Dll.sp6:13231 l.Seq 


M00004307C:A06 


509 


6420 


172.El.sp6:133925.Seq 


M00001345A:E01 


510 


16245 


172.D2.sp6:133914.Seq 


M00001352A:E02 


511 


8078 


172.C3.sp6:l 33903. Seq 


M00001353A:G12 


512 


14929 


172.D3.sp6: 13391 5.Seq 


M00001353D:D10 


513 


14391 


172.H3.sp6: 133963 .Seq 


M00001355B:G10 


514 


6583 


172.B8.sp6:133896.Seq 


M00001394A:F01 


515 


4009 


172.D8.sp6:133920.Seq 


M00001396A:C03 


516 




172.B9.sp6:133897.Seq 


M00001400B:H06 


517 




176.A3.sp6:134514.Seq 


M00001632D:H07 
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SEQIDNO: Cluster ID 


Sequence Name 


Clone Name 


518 


19267 


176.G3.sp6:134586.Seq 


M00001645A:C12 


519 


78091 


176.G5.sp6:134588.Seq 


M00001679C:F01 


520 


17055 


176.D6.sp6:134553.Seq 


M00001682C:B12 


521 


6539 


176.D9.sp6:134556.Seq 


M00003844C:B11 


522 




177.H4.sp6: 134791. Seq 


M00004121B:G01 


523 


5257 


177.F5.sp6:134768.Seq 


M00004146C:C11 


524 


11494 


177.E6.sp6:134757.Seq 

X A 


M00004172C:D08 


525 




177.G7.sp6:134782.Seq 

X X 


M00004205D:F06 


526 


11451 


177.D8.sp6:134747.Seq 

X X 


M00004214C:H05 


527 


9283 


173.D2.SP6:134106.Seq 


M00001455D:F09 


528 


16283 


173.F3.SP6:134131.Seq 


M00001467A:D08 


529 


10539 


173.B5.SP6:134085.Seq 


M00001499B:A11 


530 


6420 


173.F5.SP6:134133.Seq 


M00001504D:G06 


531 


3956 


173.H5.SP6:134157.Seq 


M00001512D:G09 


532 




173.G7.SP6:134147.Seq 


M00001544A:E06 


533 


1577 


173.C9.SP6: 134101. Seq 


M00001556AF11 


534 


9635 


173.D9.SP6:134113.Seq 


M00001557A:F01 


535 


5192 


173.E9.SP6:134125.Seq 


M00001557B:H10 


536 


6539 


173.A12.SP6:134080.Seq 


M00001579D:C03 


537 


945 


18O.C2.sp6:135940.Seq 


M00001362C:H11 


538 


7005 


180.H5.sp6: 136003 .Seq 

M A- 


M00001410A:D07 


539 


39304 


180.G9.sp6:135995.Seq 

A. A 


M00001450A:A02 


540 


27250 


180.B10.sp6:135936.Seq 


M00001450A:D08 


541 


35555 


184.A5.sp6:135530.Seq 

■A A 


M00001528A:C04 


542 


19255 


184.B10.sp6:135547.Seq 


M00001545A:C03 


543 


6268 


184.C12.sp6:135561.Seq 


M00001551AB10 


544 


3277 


217.El.sp6:139406.Seq 

X A 


M00001624A:B06 


545 


39171 


217.A12.sp6:139369.Seq 


M00001644C:B07 


546 


11460 


219.F2.sp6:139035.Seq 


M00001676B:F05 


547 


10539 


219.F6.sp6:139039.Seq 


M00001680D:F08 


548 


11476 


219.H8.sp6:139065.Seq 


M00003747D:C05 


549 


4016 


79.Al.sp6:130016.Seq 


M00001395A:C03 


550 


7674 


79.Cl.sp6:130040.Seq 

X a 


M00001416A:H01 


551 


3681 


79.El.sp6:130064.Seq 


M00001449A:D12 


552 


39304 


79.Fl.sp6:130076.Seq 


M00001450A:A02 


553 


82498 


79.Gl.sp6:130088.Seq 


M00001450AB12 


554 


84328 


79.A2.sp6: 13 00 17.Seq 


M00001452A:B04 


555 


86859 


79.B2.sp6:130029.Seq 


M00001452A:B12 


556 


1120 


79.C2.sp6: 130041 .Seq 


M00001452A:D08 


557 


85064 


79.D2.sp6:130053.Seq 


M00001452A:F05 
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SEQ ID NO: Cluster ID 


Sequence Name 


Clone Name 


558 


83103 


79.G2.sp6:130089.Seq 


M00001454A:A09 


559 


10145 


79.F3.sp6:130078.Seq 


M00001465A-.B11 


560 


16283 


79.H3.sp6:130102.Seq 


M00001467A:D08 


561 


4568 


79.D4.sp6:130055.Seq 


M00001513A:B06 


562 


4313 


79.F4.sp6:130079.Seq 


M00001517A:B07 


563 


2428 


79.A5.sp6:130020.Seq 


M00001533A:C11 


564 


39423 


79.C5.sp6:130044.Seq 


M00001535A:F10 


565 


39174 


79.E5.sp6:130068.Seq 


M00001541A-.H03 


566 


■ 22113 


79.F5.sp6:130080.Seq 


M00001542A:A09 


567 


19829 


79.H5.sp6:130104.Seq 


M00001544A:G02 


568 


13864 


79.B6.sp6:l 30033. Seq 


M00001545A:D08 


569 


1058 


79.F6.sp6:130081.Seq 


M00001548A:H09 


570 


4015 


79.G6.sp6:130093.Seq 


M00001549A:B02 


571 


39180 


79.A7.sp6:130022.Seq 


M00001551A:F05 


572 


307 


79.C7.sp6:130046.Seq 


M00001552A:B12 


573 


39458 


79.D7.sp6:130058.Seq 


M00001552A:D11 


574 


39490 


79.G7.sp6:130094.Seq 


M00001557A:F03 


575 


39486 


79.B8.sp6:130035.Seq 


M00001561A:C05 


576 


39380 


79.E8.sp6: 130071. Seq 


M00001587A:B11 


577 


1399 


79.G8.sp6: 13 0095. Seq 


M00001604A:B10 


578 


39391 


79.A9.sp6:130024.Seq 


M00001604A:F05 


579 


6268 


79.G9.sp6:130096.Seq 


M00001551A:B10 


580 




377.F4.sp6:141957.Seq 


M00004692A:H08 


581 


2448 


89.Al.sp6:130667.Seq 


M00001460A:F06 


582 


1531 


89.Cl.sp6:130691.Seq 


M00001461A:D06 


583 


19 


89.Dl.sp6:130703.Seq 


M00001463C:B11 


584 


38759 


89.Fl.sp6:130727.Seq 


M00001467A:B07 


585 


39508 


89.Gl.sp6:130739.Seq 


M00001467A:D04 


586 


16283 


89.Hl.sp6:130751.Seq 


M00001467A:D08 


587 


39442 


89.A2.sp6:130668.Seq 


M00001467A:E10 


588 


7589 


89.B2.sp6:130680.Seq 


M00001468A:F05 


589 




89.C2.sp6:130692.Seq 


M00001469A:A01 


590 


12081 


89.D2.sp6:130704.Seq 


M00001469A:C10 


591 


19105 


89.E2.sp6:130716.Seq 


M00001469A:H12 


592 


1037 


89.F2.sp6:130728.Seq 


M00001470A:B10 


593 


39425 


89.G2.sp6:130740.Seq 


M00001470A:C04 


594 


39478 


89.H2.sp6:130752.Seq 


M00001471A:B01 


595 




89.B3.sp6:130681.Seq 


M00001487B:H06 


596 




89.C3.sp6:130693.Seq 


M00001488B:F12 


597 


18699 


89.D3.sp6:130705.Seq 


M00001490B:C04 
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SEQIDNO: Cluster ID 


Sequence Name 


Clone Name 


598 


7206 


89.E3.sp6:130717.Seq 


M00001494D:F06 


599 


2623 


89.F3.sp6:130729.Seq 


M00001497A:G02 


600 


10539 


89.G3.sp6:130741.Seq 


M00001499B:A11 


601 


5336 


89.H3.sp6:130753.Seq 


M00001500A:C05 


602 


2623 


89.A4.sp6:130670.Seq 


M00001500A:E11 


603 


9443 


89.B4.sp6:130682.Seq 


M00001500C:E04 


604 


9685 


89.C4.sp6:130694.Seq 


M00001501D:C02 


605 




89.D4.sp6:130706.Seq 


M00001504A:E01 


606 


10185 


89.E4.sp6:130718.Seq 


M00001504C:A07 


607 


6974 


89.F4.sp6:130730.Seq 


M00001504C:H06 


608 


6420 


89.G4.sp6:130742.Seq 


M00001504D:G06 


609 




89.H4.sp6:130754.Seq 


M00001505C:C05 


610 




89.A5.sp6:130671.Seq 


M00001506D:A09 


611 


39168 


89.B5.sp6:130683.Seq 


M00001507A:H05 


612 


39412 


89.C5.sp6:130695.Seq 


M00001511A:H06 


613 


39186 


89.D5.sp6:130707.Seq 


M00001512A:A09 


614 


3956 


89.E5.sp6:130719.Seq 


M00001512D:G09 


615 




89.F5.sp6:130731.Seq 


M00001513B:G03 


616 


14364 


89.G5.sp6:130743.Seq 


M00001513C:E08 


617 


40044 


89.H5.sp6:130755.Seq 


M00001514C:D11 


618 


8952 


89.A6.sp6:130672.Seq 


M00001518C:B11 


619 


35555 


89.B6.sp6:130684.Seq 


M00001528A:C04 


620 


18957 


89.C6.sp6:130696.Seq 


M00001528A:F09 


621 


8358 


89.D6.sp6:130708.Seq 


M00001528B:H04 


622 


38085 


89.E6.sp6:130720.Seq 


M00001531A:D01 


623 




89.F6.sp6:130732.Seq 


M00001531A:H11 


624 


3990 


89.G6.sp6:130744.Seq 


M00001532B:A06 


625 


16921 


89.H6.sp6:130756.Seq 


M00001534A:C04 


626 


5321 


89.B7.sp6:130685.Seq 


M00001534A:F09 


627 


4119 


89.C7.sp6:130697.Seq 


M00001534C:A01 


628 


20212 


89.E7.sp6:130721.Seq 


M00001535A:C06 


629 


2696 


89.F7.sp6:130733.Seq 


M00001536A:B07 


630 


39392 


89.G7.sp6:130745.Seq 


M00001536A:C08 


631 


39420 


89.H7.sp6:130757.Seq 


M00001537A:F12 


632 


3389 


89.A8.sp6:130674.Seq 


M00001537B:G07 


633 


8286 


89.B8.sp6:130686.Seq 


M00001540A:D06 


634 


3765 


89.C8.sp6:130698.Seq 


M00001541A:D02 


635 


39453 


89.E8.sp6:130722.Seq 


M00001542A:E06 


636 




89.F8.sp6:130734.Seq 


M00001542B:B01 


637 




89.H8.sp6:130758.Seq 


M00001544A:E06 
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SEQ ID NO: Cluster ID 


Sequence Name 


Clone Name 


638 


6974 


89.A9.sp6:130675.Seq 


M00001544B:B07 


639 




89.B9.sp6:130687.Seq 


M00001545A:B02 


640 


19255 


89.C9.sp6:130699.Seq 


M00001545A:C03 


641 


1267 


89.D9.sp6:130711.Seq 


M00001546A:G11 


642 


5892 


89.E9.sp6:130723.Seq 


M00001548A:E10 


643 


4193 


89.G9.sp6:130747.Seq 


M00001549B:F06 


644 


16347 


89.H9.sp6:130759.Seq 


M00001549C:E06 


645 


7239 


89.A10.sp6:130676.Seq 


M00001550AA03 


646 


5175 


89.B10.sp6:130688.Seq 


M00001550A:G01 


647 


22390 


89.C10.sp6:130700.Seq 


M00001551A:G06 


648 


3266 


89.D10.sp6:130712.Seq 


M00001551C:G09 


649 


5708 


89.E10.sp6:130724.Seq 


M00001552B:D04 


650 




89.F10.sp6:130736.Seq 


M00001552DA01 


651 


8298 


89.G10.sp6:130748.Seq 


M00001553AH06 


652 


4573 


89.H10.sp6:130760.Seq 


M00001553B:F12 


653 


22814 


89.All.sp6:130677.Seq 


M00001553D:D10 


654 


39539 


89.Bll.sp6:130689.Seq 


M00001555A:B02 


655 


39195 


89.Cll.sp6:130701.Seq 


M00001555A:C01 


656 


4561 


89.Dll.sp6:130713.Seq 


M00001555D.G10 


657 


9244 


89JEll.sp6:130725.Seq 


M00001556A:C09 


658 


1577 


89.Fll.sp6:130737.Seq 


M00001556A:F11 


659 


4386 


89.Hll.sp6:130761.Seq 


M00001556B:C08 


660 


11294 


89.A12.sp6:130678.Seq 


M00001556B:G02 


661 


5192 


89.D12.sp6:130714.Seq 


M00001557B:H10 


662 


8761 


89.E12.sp6:130726.Seq 


M00001557D:D09 


663 




89.F12.sp6:130738.Seq 


M00001558A:H05 


664 


7514 


89.G12.sp6:130750.Seq 


M00001558B:H11 


665 




89.H12.sp6:130762.Seq 


M00001559B:F01 


666 


6558 


90Al.sp6:130859.Seq 


M00001560D:F10 


667 


102 


90.Bl.sp6:l 30871 .Seq 


M00001563B:F06 


668 




90.Dl.sp6:130895.Seq 


M00001566B:D11 


669 


5749 


90.El.sp6:130907.Seq 


M00001571C:H06 


670 


6539 


90.Gl.sp6: 130931. Seq 


M00001579D:C03 


671 


6293 


90A2.sp6:130860.Seq 


M00001583D:A10 


672 




90.C2.sp6:130884.Seq 


M00001590B:F03 


673 


260 


90.D2.sp6:130896.Seq 


M00001594B:H04 


674 


4837 


90.E2.sp6:130908.Seq 


M00001597C-.H02 


675 


10470 


90.F2.sp6:130920.Seq 


M00001597D:C05 


676 


16999 


90.G2.sp6:130932.Seq 


M00001598A:G03 


677 


22794 


90.H2.sp6:130944.Seq 


M00001601A:D08 
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SEQIDNO: Cluster ID 


sequence iMame 


cione iNanie 


678 


11465 


QH AT 1108^1 Con 
yU.AJ.SpO. 1 JUoOl .oCCJ 


Mnnnm^n7A'Fi i 

iVIUUUU 1 OU/ A.!!/ J 1 


679 

x^ * ^ 


7802 


yU.rSj .Spo. loUo /o.oeq 


Mnnnn i a *Rn^ 

IVIUUUU 1OU0A.DUJ 


680 


22155 


yU.vo.SpO. i jUooj.oeq 


iVlJUUU IOUOO.IjUj 


681 




OD Vll cr\6-1 **n»Q7 Con 
yV.Lfj .SpO. 1 jU07 / .oeq 


iVXUUl/U lOUOL/.Ai I 


682 

V \S 


13157 


on ct>^-i "inono Con 
yu.r-j.spo. j juyuy.oeq 


\yfnnnn 1 & i *v 1 n 

1Y1UUUU 1 0 1 ^L- .r 1 u 


683 

V W 


17004 


OA PI 1 IfiQ"} 1 C«n 

yu.r j.spo. l juyz i .oeq 


\4nnnn 1A1 ir*v(Y) 


684 


40314 


yu.vja.spo. liuyji.oeq 


\/innnn 1 ^ i op-fi 7 
lviuuuu loiyL.r iz 


685 


40044 


on i-iQ ^'nnoyi^ Con 
yu.ru .spo. I juy4:>.oeq 


A/innnn 1 /C7 1 p-r^ns 


686 


13913 


yu.A4.spo. 1 JUooz.oeq 




687 


3277 


yu.r>4.spo.l JUo/4.oeq 


\/rnnnn 1 A7/i a 

MUUUU 10Z4A.riU0 


688 

V» w v 


4309 


yu.Lx4.spo. i Juooo.oeq 


MUUUU 1 t>Z4L/.r u l 


689 


5214 




\/fnnnn i /nnR'Uno 
MUUUU 10 juts. riuy 


690 




on r. « . 1 o aoi n c«*-« 

yu.b4.spo: ljuyiu.oeq 


MUUUU 1 WZD.riU / 


691 


39171 


yU.r4.spo: 1 JUyzz.iSeq 


\A AAA A 1 ^/l/IP.DAIf 

MUUUU 1 o44C:r>U / 


692 


19267 


yU.Cj4.spD.13Uyj4.!Seq 


\^AAAA 1 rfC/IC A 1 7 

MUUUU 1o4jA.U1Z 


693 


4665 


yu.rl4.spo:l JUy4o.oeq 


A vfAAAA \ (LA QC*- A A 1 

MUUUU 1 04oC AU 1 


694 




yu.Aj.spo: 1 JuooJ.beq 


A>fAAAA1/IO A.TJTA1 

MUUUU 1 Oj 1 A:rlU 1 


695 


23201 


yu.rs>.spo:i jUo/j.oeq 


XvTAAAA t /CC7F\.^A'J 

MUUUU i OD /U.CU3 


696 


76760 


yu.Cj>.spo. liUoo /.oeq 


A/fAAAAl /CC7TVITAQ 

MUUUU 1 o!> /D.rUo 


697 


23218 


yu.Uj.spoiliUoyy.oeq 


A /f AA AA 1 ^^7/^ • A AO 

MUUUU 1 OOZCAUy 


698 


35702 


OA t?C lAOl 1 c^»« 

yu.b->.spo:l Juyi l.oeq 


MUUUU I oo3A:bU4 


699 


6468 


yu.r j .spo . i j uyzi . oeq 


A>TAAAA1 iC/COT5-XTA7 

MUUUU 1 ooyrJ.r Uz 


700 

/ X^ \^ 


14367 


yu.Oj.spo.i^uyj j.oeq 


\yf A AAA 1 ^7A< r ^«lTA7 

MUUUU 1 0 /UCrlUZ 


701 


7015 


yu.oj .spo . i J uy4 / . beq 


\/TAAAA1 ^77r , .lin7 

MUUUU i 0 /3CriUz 


702 

i X/ 


8773 


on cn^>nAQ^/i Con 

yu.Ao.spo: i jUoM.beq 


A>innnn 1 /^7C a *r^no 
MUUUUlO/DA.CUy 


703 


11460 


yu.r>o.spo: 1 3Uo /o.beq 


A/iAAAm ^7A:tj.trn< 
MUUUU 1 o /orJ.r Uj 


704 


7570 


yu.iJo.spo: i iuyuu.oeq 


A AAAAA 1 /C77FV. A n7 

MUUUU 1 0 / JL). AU / 


705 


4416 


yu. bo. spoil JUy iz.beq 


A /TAAAA 1 iC70T^»l? 1 7 

MUUUU i o / oD.r 1 z 


706 


6660 


yu.r o.spo. i ->uyz4. beq 


A/f AAAA1 /t7Q A • A A/C 

MUUUU 1 o /yA. AUo 


707 




yu.rio.spo. I3uy4o.oeq 


A/f AAAA 1 £7Q A *T2(\£ 
MUUUU 1 0 /yA.rUD 


708 


26875 


On A7 er»/^'1infi^< Con 

yu.A/.spo. ijUouD.aeq 


\/rnnnn i ^70 a « f i n 
MUUUU 10 /yA.r 1U 


709 


6298 


QA R7 on£-nn£77 Con 


iVIUUUUl O /yij.rUl 


710 


78091 


On P7 cn^*1 ^nSRO Con 

yu.v^ / .spo.i juooy.oeq 


MUUUU 1 0 /yL.rUi 


711 


10751 


on n7 ot\^»i ^noni Con 
yu.u/.spo. i juyui .oeq 


A/innnm ^7or^*Fin^ 

MUUUU 1 0 lyU.jJvj 


712 


10539 


On P7 er»£- 1 ^n07^ Con 

yu.r /.spo. i JuyzD.oeq 


\Annnm £°.nrvFnc 
MUUUU 1 OoUU.r Uo 


713 


17055 

J. f \J +J mJ 


on n7 ct>/^«i inoT7 Con 
yu.o /.spo. l j\)yj /.oeq 


A/rnnnn i <cc7P«tj 1 7 
MUUUU 1 Oozt.D 1 Z 


714 


5382 


90.A8.spo:130866.beq 


-\ ft f\ f\f\(\ 1 /"OO/^.TAA 

M00001688C:F09 


715 


4393 


90.B8.sp6:130878.Seq 


M00001693C.-G01 


716 


67252 


90.C8.sp6:130890.Seq 


M00001716D:H05 


717 


40108 


90.D8.sp6:130902.Seq 


M00003741D:C09 
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SEQ ID NO: Cluster ID 


Sequence i>ame 


Clone iName 


718 


1 1476 


yu.bo.spo. J j\jy I4.beq 


\ ^000077/1 71^*^0^ 
MUUUU3 /4 / JJ.CUj 


719 

iky 




yu.ro.spo.l JUyzo.oeq 


ivyfnnnoi 7^/1^-^00 

MUUUU3 /j4L.Dliy 


720 


697 


yu.Uo.spo. i3uy3o.beq 


A/f nnnm 7 ^on«T5 no 

muuuu3 /jyr>.t)Uy 


721 




yu.no. spo. i3uyjU. beq 


\yrnnno / 27^ in«A no 

MUUUU3 / OlJJ.AUy 


722 


17076 


OH AO 1 Q AQ£7 C^^i 

yu.Ay.spo. l jvoo /.beq 


\/fnnnnQ7^or ,, «tjnQ 
MUUUU3 /oZCJdUo 


723 


3108 


yu.By.spo:i3Uo /y.beq 


iv/fnnnn'3 7/iT a -un^ 
NIWkjVj /03A.rUo 


724 


67907 


9U.C9.spo: 13U89 1 .beq 


a ^nnnnQ 77/ r* * a 01 
MUUUU3 / /4L.AU3 


725 




on t\c\ ft «/:.ioAnA'j 

yu.U9.spo: 1 3U9U3.beq 


MUUUU3 /o4D.JJlz 


726 


1 1350 


9U.r9.spo: 130927. beq 


"\ jTnAAA'3 oo^ct> . a n^c 

MU0UU3 826B : AUo 


727 


7899 

i \j y y 


on t ta „„/:,iOAAc 1 c^.„ 

9U.ri9.spo: 1 3095 1. beq 


X IAAAA1 OO^Hi AA1 

MUUUU353 /D:AU1 


728 

/ Mil 


7798 


yU.Al U.spo. 13Uooo. beq 


\jsf\nnf\i 000 a .r\Ao 
MUUUU3 o39A.UUo 


729 


6539 

\j +j ~j y 


on "Din 1 o noon o ~„ 

9UJ3 1 U.spo: 1 3 UooU.beq 


X /TAAAA 1 ? O A A f~* .T) 1 1 

M00UU3844C:B1 1 


730 


6874 


nn pin i o a 00*0. 
9U.L 1 U.spo: 1 3U892.beq 


MUUUU3 o4ob .DUo 


731 




9U.D 1 U.spo. 13U9U4.beq 


A iAAAA^ O C 1 ID .AAO 

MUUUU3 8 5 1 B :DU8 


732 


13595 

1 yj -Jy J 


on ci n i oaai /c 

9U.hl U.spo: 1 3U9 1 o.beq 


X jfAAAAl OC1 D 1 A 

M00UU3 o 5 1 B :D 1 U 


733 


5619 


on tr i n ^/r.iiAmo 

9U.rlU.sp6: 130928. beq 


Tk JTAAAAT O A .T\A/f 

M00003 853 A:DU4 


734 


10515 


nn pin —^.nAA/iA o 

90.ul U.spo: I30940.beq 


XjfAAAAOOCO A ,ri1 

M00003853A:F12 


735 


4622 


on tt i n s>*-*a£ . 1 oaaci c^.« 
yU.rl 1 U.spo: i 3 U9 52. beq 


X K AAA AO O C/TT) ,PAO 

MU00U385oB:CU2 


126 


33RQ 

JJ07 


nn a i 1 ^ „ /; . i o Aozrn 

9U.All.spo:13U869.beq 


X iAAAAO OCH A .Z^ 1 1 A 

M00003857A:GlO 


131 

J mm* I 


4718 


9U.B1 l.spo: 13088 l.Seq 


X >f AAAA"7 OCH A •TTA'? 

MU0003857A:HU3 


738 




90.C1 l.spo: 130893. Seq 


X iT AAA AO O/T A .T\ 1 A 

M00003 867A:D 1 0 


739 


12977 

i tit y l l 


9U.r 1 1 .spo : 1 3 0929. beq 


M00003875B:F04 


740 


8479 


90.O1 l.spo: 130941. Seq 


X A A A A AO OTf P.PAT 

M00003875CGU7 


741 




9U.ril l.spo: 130953. beq 


X AAA A AO O^f A.r» 1 1 

M00003875D:D1 1 


742 


7798 


9U.Al2.spo:l3Uo /U.beq 


X AAAAAO 0'7/C'T\.T? 1 O 

MU0U03 876D:bl 2 


743 


5345 


VU.rJ lz.spo. 13(/oo2.beq 


A i AAA m 070D -P 1 1 

MUUUU3o /9B.L1 1 


144 

9 * 1 


31587 


9U.L, lz.spo. 1 3Uo94.beq 


X AAA AAO DOT! 15 ."P\ 1 A 

MUUUU3 o / y B :L> 1 U 


745 


14507 


on nn 1 1 non/c 

vu.u lz.spo. i3uyuo.beq 


MUUUU38 /9U:AUz 


746 

9 ■ 


13576 

i ****** * \j 


00 171 0 nnlC.I UMA Pan 

9U.r lz*spo:13U93U.beq 


X An A A AO OOCr". A AO 

MU0UU3885C:AU2 


747 




on mi ^ no/to c^^, 

vu.o lz.spo. 1 3 U94z.beq 


MUUUU3oyiC:HU9 


748 


9285 


9U.rilZ.Spo.13U9j4.beq 


AMAAAIAA^P.DI A 

MuUVVj 9UoC :b 1 U 


749 


39809 


yy.Ai.spo: 13 lz3U.beq 


X AAAAAO AA7A. A AO 

MUUUU39U /D:AU9 


750 


16317 


QQ D1 o«£*1 11 Can 

yy.rj j .spo. 131 Z4z.beq 


x a n n nn"> on7T^ .urn a 
MUUUUj9U / D:riU4 


751 


8672 


yy.c 1 .spo. 1 3 1 z j4. beq 


X AnAAAO AAAA.PA'i 

MUUUU3yU9D:CU3 


752 


12532 


QA HI pn^.lin^ Can 

yy.L/i.spo. 13 1 zoo. beq 


X AAAAAO 0 1 OT> .T~\A 1 

MUUUU3 9 1 zB :ui) 1 


753 

1 -s -J 


3900 

«/ y\J\J 


QQ "CI e*%£»im*7Q Cm*. 

yy.ci.spo.i 3 iz /o.beq 


XAAAAAOA1 A . HAC 

MUUUU3914C:rU5 


754 


23255 


99.F1 .sp6: 13 1 290. Seq 


M00003922A:E06 


755 


24488 


99.C2.sp6:131255.Seq 


M00003968B:F06 


756 


40122 


99.D2.sp6:131267.Seq 


M00003970C:B09 


757 


23210 


99.E2.sp6:131279.Seq 


M00003974D:E07 
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SEQ ID NO: Cluster ID 


Sequence Name 


Clone Name 


7W 

/JO 


Z J J JO 


99.F2.sp6:l3l29l.Seq 


M00003974D:H02 


7SQ 




99.A3.sp6:131232.Seq 


M0000398lA:ElO 


760 




/~\<X -r*» 1 /"lOIAAylC 

99.B3.sp6:l3l244.Seq 


M00003982C:C02 


761 


010^ 


nr\ /""ia x" t o i a ^ x o 

99.C3.sp6:131256.Seq 


X if AAAA1 Aft1 A A AiT 

M00003983A:A05 


76? 


U 1 Zt- 


99.D3.sp6:131268.Seq 


M00004028D:A06 


76^ 


40071 


99.E3.sp6:l3l280.Seq 


X iTAAAA A AOOR.PAr 

M00004028D:C05 


764 


J / ZO J 


99.H3 .sp6 : 1 3 1 3 1 6 . Seq 


X iTAAAA >l A*>f A A^ 

M00004035C:A07 


76S 


1 7016 


99.A4.sp6:131233.Seq 


M00004035D:B06 


766 


1706 


99.C4.sp6:131257.Seq 


M00004068B:A0l 


767 




99.D4.sp6:131269.Seq 


M00004072A:C03 


768 


1 ^060 


99.F4.sp6:131293.Seq 


M0000408lC:DlO 


760 


0?$K 
yzoj 


99.H4.sp6:131317.Seq 


M00004086D:G06 


770 


68£0 
OooU 


99.A5.sp6:131234.Seq 


M00004087D:A01 


771 




A A X"1 ^ X 1 o 1 1 ^ r^i 

99.C5.sp6:131258.Seq 


M00004093D:B12 


770 

/ /Z 


/ZZ 1 


99.D5.sp6:131270.Seq 


M00004105C:A04 


771 


4017 


99.E5.sp6:131282.Seq 


M00004108A:E06 


774 


05 / *f 


99.F5.sp6:131294.Seq 


M00004111D:A08 


77S 


1 11 81 


99.G5.sp6:131306.Seq 


M00004114C.-F11 


776 




99.H5 ,sp6: 13131 8 .Seq 


M00004l2lB:G0l 


777 


1 1079 
1 JZ /Z 


99.A6.sp6:131235.Seq 


M00004138B:H02 


778 
/ f o 


^0S7 


99.B6.sp6: 1 3 1 247.Seq 


M00004146C:C11 


770 


64^ 


99.D6.sp6:l3l27l.Seq 


M00004157C:A09 


7X0 




99.E6.sp6:l3l283.Seq 


M00004169C:C12 


7R1 


4008 


99.F6.sp6:l3l295.Seq 


M00004171D:B03 


7K2 


1 1404 


99.G6.sp6:131307.Seq 


■fc jt xx xx xx xx A i ^^/^ x^** 1 * y — xx <x 

M00004172C:D08 


/ OJ 


1 1441 


99.A7.sp6:131236.Seq 


at XX xX /X. /X A f rx x*~1 rf^i /X 

M00004185C:C03 


784 




/-\/~v y ioii/«rt r> 

99.B7.sp6:131248.Seq 


M00004191D:B11 


7aS 


R010 
oz 1U 


99.C7.sp6:131260.Seq 


M00004197D:H01 


786 


1411 1 


99.D7.sp6: 1 3 1 272.Seq 


x xa xx yx *X A /*X /X^ T^V x^X t 

M00004203B:C12 


787 




99.E7.sp6: 13 1284. Seq 


M00004205D:F06 


788 


12071 


99.B8.sp6: 13 1 249.Seq 


» M /\f\ A A /lAnTN T^A /I 

M00004223D:E04 


789 


6455 


99.C8.sp6: 1 3 1 26 1 .Seq 


M00004229B:F08 


790 


7212 


C\C\ V\ O - /".nil t> _ 

99.D8.spo: 1 3 1 273 .Seq 


X XAAAA ^iirtn nA*1 

M00004230B:C07 


791 


4905 


99.H8.sp6: 13 1j21. Seq 


-m /AAA A .4^y^xT"V r\ A y 

M00004269D:D06 


792 


16914 


C\C\ A A - /" 1 O 1 1 1 O o 

99.A9.sp6: 1 3 1 23 8.Seq 


x jtaaaa a*\ic x^i /n 1 i 

M00004275C:C11 


791 


1 6001 

1 O^Zl 


99.D9.spo: 1 3 1 274.Seq 


M00004295D:F12 


794 


13046 


99.E9.sp6:131286.Seq 


M00004296C:H07 


795 


9457 


99.F9.sp6:131298.Seq 


M00004307C.-A06 


796 


26295 


99.G9.sp6:131310.Seq 


M00004312A:G03 


797 


21847 


99.H9.sp6:131322.Seq 


M00004318C:D10 
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SEQ ID NO: Cluster ID Sequence Name Clone Name 

798 99.H10.sp6:131323.Seq M00004505D.F08 

799 99.Bll.sp6: 13 1252.Seq M00004692A:H08 

800 99.D 1 1 .sp6: 1 3 1 276.Seq M00005 1 80C:G03 

801 39304 RTA00000118A.j.21.1.Seq_THC151859 

802 2428 RTA00000123A.1.21.1.Seq_THC205063 

803 1058 RTA00000126A.e.20.3.Seq_THC217534 

804 5097 RTA00000134A.k.l.l.Seq_THC215869 

805 20212 RTA00000134A.1.22.1.Seq_THC128232 

806 23255 RTA00000177AF.e.l4.3.Seq_THC228776 

807 2790 RTA00000177AF.e.2.1.Seq_THC229461 

808 6420 RTA00000177AF.f.l0.3.Scq_THC226443 

809 4059 RTA000001 77 AF.n.l 8.3. Seq_THC 123051 

8 1 0 RTA00000 1 79AFJ. 13.1 .Seq_THC 1 05720 

811 9952 RTA00000180AF.c.20.1.Seq_THC 162284 

812 13238 RTA000001 81 AF.m.4.1.Seq_THCl 40691 

813 9685 RTA00000 1 83 AF.c.ll.l.Seq_THC 109544 

8 1 4 RTA000001 83 AF.c.24. 1 .SeqJTHCl 259 1 2 

815 6420 RTA00000183AF.d.ll.l.Seq_THC226443 

816 6974 RTA00000183AF.d.9.1.Seq_THC223129 

817 40044 RTA00000183AF.g.22.1.Seq_THC232899 

818 RTA000001 83 AF.g.9. 1 .Seq_THC 1 98280 

819 5892 RTA00000184AF.d.ll.l.Seq_THC161896 

820 40044 RTA00000186AF.d.l.l.Seq_THC232899 

821 RTA00000186AF.h.l4.1.Seq_THCl 12525 

822 19267 RTA00000186AF.1.12.1.Seq_THC178183 

823 8773 RTA00000187AF.f.24.I.Seq_THC220002 

824 7570 RTA00000187AF.g.24.1.Seq_THC168636 

825 11476 RTA00000187AF.p.l9.1.Seq_THC108482 

826 RTA00000 1 88AF.d. 1 1 . 1 .Seq_THC2 1 2094 

827 17076 RTA00000188AF.d.21.1.Seq_THC208760 

828 697 RTA00000188AF.d.6.1.Seq_THC178884 

829 67907 RTA00000188AF.g.ll.l.Seq_THC123222 

830 5619 RTA00000188AF.1.9.1.Seq_THC167845 

831 4718 RTA00000189AF.g.5.1.Seq_THC196102 

832 39809 RTA00000190AF.e.3.1. SeqJTHCl 502 17 

833 23255 RTA00000190AF.j.4.1.Seq_THC228776 

834 40122 RTA00000190AF.n.23.1.Seq_THC109227 

835 23210 RTA00000190AF.o.20.1.Seq_THC207240 

836 23358 RTA00000190AF.o.21.1.Seq_THC207240 
83 7 5693 RTA00000190AF.p.l7.2.Seq_THC173318 
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SEQ ID NO: Cluster ID Sequence Name Clone Name 



838 


2433 


RTAOOOOO 1 9 1 AF .a. 1 5 .2 . Seq_THC79498 


839 


5257 


RTAO0000 1 92 AF.f.3 . 1 .Seq_THC2 1 3 833 


840 


16392 


RTAOOOOO 1 92 AF.l. 1 . 1 .Seq_THC20207 1 


841 




RTAOOOOO 1 93 AF.c.2 1 . 1 .Seq_THC222602 


842 


26295 


RTA00000193AF.i.24.2.Seq_THC197345 


843 




RTA00000193AF.m.5.1 .Seq_THC1733 1 8 


844 




RTA00000193AF.n.l5.1.Seq_THC215687 
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■ ■ ■ 

sststiii 


Nearest 
Neighbor 
(BlastN vs. 
Genbank) 


Nearest 
Neighbor 
(BlastX vs. 
Non- 
Redundant 
Proteins) 




ID 


ACCESSION 


DESCRIPTION 


P 

VALUE 


ACCESSION 


DESCRIPTION 


P 

VALUE 1 


1 


<NONE> 


<NONE> 


<NONE> 


<NONE> 


<NONE> 


<NONE I 
> 1 

<NONE 1 


2 


<NONE> 


<NONE> 


<NONE> 1 


<NONE> 


<INUJNe^ > 


> J 


3 


<NONE> 


<NONE> 


<NONE> 


<NONE> 


<NONE> 


<NONE 1 
> 1 


4 


<NONE> 


<NONE> 


<NONE> 


BAR3_CHITE 


BALBIANI RING 
PROTEIN 3 

PRECURS0R>PIR2 : S08 
167 Balbiani ring 3 
protein - midge 
(Chironomus 
tentans)>GP:CTBR3J 
Qtentans balbiani ring 3 
(BR3) gene 




5 


<NONE> 


<NONE> 


<NONE> 


CYAA PODA 
N 


ADENYLATE 
CYCLASE (EC 4.6.1.1) 
(ATP 

PYROPHOSPHATE- 
LYASE) (ADENYLYL 
CYCLASE)>PIR2:JC47 
47 adenylate cyclase (EC 

4.6.1.1) -Podospora 
anserina>GP:PANADCY 

1 Podospora anserina 
adenyl cyclase gene, 
exons 1-4 


M 

0.97 1 


6 


<NONE> 


<NONE> 


<NONE> 


VP03JHSVSA 


PROBABLE 

ANTIGEN 3 
(TEGUMENT 
PROTEIN)>PIR2:C3680 
6 hypothetical protein 
0RF3 - saimiriine 
herpesvirus 1 (strain 
11)>GP:HSGEND_3 
Herpesvirus saimiri 
complete genome DNA; 
ORF 03; similarity to 
ORF 75 and EBV 
BNRF1 
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1 

\ 


wmmmm 

WSm 


Nearest 
N pi oh nor 

(BlastN vs. 
Genbank) 






Nearest 
Neighbor 
(BlastX vs. 
Non- 
Redundant 






i 

• 










Proteins) 








SEQ 
ID 


ACCESSION 


DESCRIPTION 


P 

VALUE 


ACCESSION 


DESCRIPTION 


P 

VALUE 




7 


<NONE> 


<NONE> 


<NONE> 


ATFCA2J8 


Arabidopsis thaliana 
DNA chromosome 4, 
ESSA I contie fragment 


0.93 


j : 

1 £ 

m 

jc *; 












No; 2; Hydroxyproline- 

rit/ii gljCUpiUlCIIl 

homolog; Similarity to 
hydroxyproline-rich 
glycoprotein precursor- 
common tobacco 




r 

,■*„';; V 

S <t; 

| 5 p ? 

J ;p: 
5 : 


8 


<NONE> 


<NONE> 


<NONE> 


DHAL_ASPN 

VJ 


ALDEHYDE 
DEHYDROGENASE 


0.9 


P 

III 
s 

a 

> 

ft S» (• 

J 8 ;: 
5 












(EC 1.2.1.3) 

(ALDDH)>GP:ASNALD 
AA J Aspergillus niger 
aldehyde dehydrogenase 
(aldA) gene, complete 
cds 




* ns; 

yp 

55? 

1: If 


9 


<NONE> 


<NONE> 


<NONE> 


NCU50264_1 


Neurospora crassa two- 
component histidine 
kinase (nik-1) gene, 5 r 
region and partial cds 


0.86 




10 


<NONE> 


<NONE> 


<NONE> 


NEUG_BOVI 

N 


NEUROGRANIN (PI 7) 

IMMUNOREACTIVE 
C-KINASE 

SUBSTRATE) (BICKS) 
(FRAGMENT)>PIR2 : A3 
9034 neurogranin - 
bovine (fragment) 


0.82 




11 


<NONE> 


<NOTME> 


<NONE> 


HUMBYSTIN 
_1 


Homo sapiens bystin 
mRNA, complete cds 


0.81 




12 


<NONE> 


<NONE> 


<NONE> 


BTBMP1_1 


Bos taurus BMP1 gene, 
partial sequence; Bone 
morphogenetic protein 1 


0.69 




13 


<NONE> 


<NONE> 


<NONE> 


TCCYSPROT 
_1 


T;congolense mRNA for 
(prepro) cysteine 
proteinase 


0.56 
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BWlllifi 


Nearest 
Neighbor 
(BlastN vs. 
Genbank) 


Nearest 

Neighbor 
(BlastX vs. 
Non- 
Redundant 
Proteins) 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P 

VALUE 


ACCESSION 


DESCRIPTION 


P 

VALUE 


14 


<NONE> 


<NONE> 


<NONE> 


P60_LISIV 


PROTEIN P60 
PRECURSOR 
(INVASION- 
ASSOCIATED 
PROTEIN)>GP:LISIAP 

RELB_1 Listeria 
ivanovii extracellular 
protein homologue (iap) 
gene, complete cds 


0.15 


15 


<NONE> 


<NONE> 


<NONE> 


HEX_ADE31 


HEXON PROTEIN 
(LATE PROTEIN 2) 
(FRAGMENT)>PIR2:S3 
7217 hexon protein - 
human adenovirus 3 1 
(fragment)>GP:HSAT31 
H_l H;sapiens 
adenovirus type 31 hexon 
gene; Hexon protein; 
Internal fragment 
containing hypervariable 
regions 


0.15 


16 


<NONE> 


<NONE> 


<NONE> 


HSU77493_1 


Human Notch2 mRNA, 
partial cds; 

Transmembrane protein; 

hN 


0.13 


17 


<NONE> 


<NONE> 


<NONE> 


CYB_PARTE 


CYTOCHROME B (EC 
1.10.2.2)>PIR2:S07743 

cytochrome b - 
Paramecium tetraurelia 

mitochondrion 
(SGC6)>GP:MIPAGEN_ 

1 9 Paramecium aurelia 
mitochondrial complete 
genome; Apocytochrome 
b (AA 1-391) 


0.078 


18 


<NONE> 


<NONE> 


<NONE> 


HUMERB27 
1 


Human c-erbB-2 gene, 
exon 7; C-erb-2 protein 


0.054 
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■m 

■ppK 


Nearest 
Neighbor 
(BlastN vs. 
Genbank) 


Nearest 
Neighbor 
(BlastX vs. 
Non- 
Redundant 
Proteins) 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P 

VALUE 


ACCESSION 


DESCRIPTION 


P 

VALUE 


19 


W T/\\ TTHv 

<NONE> 


<NONE> 


<NONE> 


DMTRX11I_2 


D;meianogaster jjina ior 
trxl and trxll genes; 
Trithorax protein trxl; 
Trithorax; 

putative>GP:DMTTHOR 
AX_2 D;melanogaster 
DNA for (putative) 
trithorax protein; 
Predicted trithorax 
protein 




20 


<NONE> 


<NONE> 


<NONE> 


CELB0281_5 


Caenorhabditis elegans 
cosmid B0281; Similar to 
reverse transcriptases 




21 


<NONE> 


<NONE> 


<NONE> 


MOTY VIBP 
A 


SODIUM-TYPE 
FLAGELLAR PROTEIN 
MOTY 

PRECURSOR>GP:VPU 
06949_4 Vibrio 
parahaemolyticus BB22 
RNase T (rnt) gene and 
flagellar motor 
component (motY) gene, 
complete cds 


0.041 


22 


<NONE> 


<NONE> 


<NONE> 


A56263 


beta-galactosidase (EC 
3.2.1.23) isozyme 12- 
Arthrobacter sp. (strain 
B7)>GP:ASU17417J 
Arthrobacter sp; beta- 
galactosidase gene, 
complete cds 


0.04 
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:--- >:: <V;>^^:-;:W' : is-, : :: 



IS 



SEQ 
ID 



Nearest 
Neighbor 
(BlastN vs. 
Genbank) 



ACCESSION 



DESCRIPTION 



P 

VALUE 



Nearest 
Neighbor 
(BlastX vs. 
Non- 
Redundant 
Proteins) 



ACCESSION 



DESCRIPTION 



P 

VALUE 



23 



<NONE> 



24 



<NONE> 



<NONE> 



<NONE> 



<NONE> GSA PSEAE 



<NONE> 



SI 6323 



GLUTAMATE- 1 - 
SEMI ALDEHYDE 2,1- 
AMINOMUTASE (EC 
5.4.3.8) (GSA) 
(GLUTAMATE- 1- 
SEMIALDEHYDE 
AMINOTRANSFERAS 

E) (GSA- 
AT)>PIR2:S57898 
glutamate 1- 
semialdehyde 2,1- 
aminomutase - 
Pseudomonas 
aeruginosa>GP:PAHEM 
L_l P;aeruginosa hemL 
g ene; Glutamate 1-sem 
hypothetical protein - 
Arabidopsis 
thaliana>GP:ATHBl_l 
A;thaliana homeobox 
gene Athb-1 mRNA; 
Open reading frame 



0.038 



0.035 



25 



<NONE> 



<NONE> 



<NONE>| IRS1 RAT 



INSULIN RECEPTOR 
SUBSTRATE- 
1>PIR2:S16948 
hypothetical protein IRS- 
1 - 

rat>GP:RNIRSHRM_l 
R;Norvegicus IRS-1 
mRNA for insulin- 
receptor; During insulin 
stimulation, undergoes 
tyrosine phosphorylation 
and binds 

phosphatidylinositol 3- 
kinase 



0.027 
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ill ifg 



mi 



SEQ 
ID 

26 



Nearest 
Neighbor 
(BlastN vs. 
Genbank) 



ACCESSION 



<NONE> 



DESCRIPTION 



<NONE> 



P 

VALUE 

<NONE> 



Nearest 
Neighbor 
(BlastX vs. 
Non- 
Redundant 
Proteins) 



ACCESSION 



CEM02G9 2 



DESCRIPTION 



P 

VALUE 



Caenorhabditis elegans 
cosmid M02G9; 
M02G9;1; Similar to 
keratin like protein; 
cDNA ESTyk308gll;5 
comes from this gene; 
cDNAESTyk208ell;5 
comes from this gene; 
cDNAESTyk208ell;3 
comes 



27 



<NONE> 



<NONE> 



<NONE> 



S75490 3 



competence region: 
iga=IgA protease, 
comA=transformation 
competence [Neisseria 
gonorrhoeae, MS 1 1 , 
Genomic, 3 genes, 2664 
nt] 



0.0041 



28 



<NONE> 



<NONE> 



<NONE> 



EXTN_TOBA 
C 



EXTENSIN 
PRECURSOR (CELL 
WALL 

HYDROXYPROLINE- 
RICH 

GLYCOPROTEIN)>PIR 
2:S06733 

hydroxyproline-rich 
glycoprotein precursor - 
common 

tobacco>GP:NTEXT_l 
Tobacco HRGPnt3 gene 
for extensin; Extensin 
(A A 1-620) 



0.0025 



29 



<NONE> 



<NONE> 



<NONE> 



HPCEGS 1 



Hepatitis C virus 
complete genome 
sequence; Polyprotein 



0.0014 



30 



<NONE> 



<NONE> 



<NONE> 



HHVBC 4 



Human hepatitis virus 
(genotype C, HMA) 
preSl,preS2, S, C, X, 
antigens, core antigen, X 
protein and polymerase 



0.00093 
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s : sf : ;S:4!}- : - : £sE;i is; 


Nearest 
Neighbor 
(BlastN vs. 
Genbank) 


Nearest 
Neighbor 
(BlastX vs. 
Non- 
Redundant 
Proteins) 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P 

VALUE 


ACCESSION 


DESCRIPTION 


P 

VALUE 


31 


<NONE> 


<NONE> 


<NONE> 


HSLTGFBP4 
1 


Homo sapiens mRNA for 
latent transforming 
growth factor-beta 
binding protein-4; Latent 
TGF-beta binding 
protein-4 


0.00061 


32 


<NONE> 


<NONE> 


<NONE> 


S74909 


transposase - 
Synechocystis sp. (PCC 
6803)>GP:D90909J08 
Synechocystis sp; 
PCC6803 complete 
genome, 11/27, 1311235- 
1430418; Transposase; 
ORFJD:slr2062 


0.00051 


33 


<NONE> 


<NONE> 


<NONE> 


GRN MOUS 
E 


GRANULINS 
PRECURSOR 
(ACROGRANIN)>GP:M 
USAP_1 Mouse gene for 
acrogranin precursor, 
complete cds 


0.00022 


34 


<NONE> 


<NONE> 


<NONE> 


CA21 MOUS 
E 


PROCOLLAGEN 
ALPHA 2(1) CHAIN 
PRECURSORS R2 : A4 
3291 collagen alpha 2(1) 
chain precursor - 
mouse>GP :MMCOL 1 A2 
J Mouse COL1A2 
mRNA for pro-alpha-2(I) 
collagen 


0.00016 


35 


<NONE> 


<NONE> 


<NONE> 


MMMHC29N 
7_2 


Mus musculus major 
histocompatibility locus 
class III 

region:butyrophilin-like 
protein gene, partial cds; 
Notch4, PBX2, RAGE, 
lysophatidic acid acyl 
transferase-alpha, 
palmitoyl- 


8.00E- 
05 
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I 




Nearest 






Nearest 






I 

■ 

■ 


: 

■ 


(BlastN vs. 
Genbank) 






Neighbor 
(BlastX vs. 
Non- 
Redundant 
Proteins) 








SEQ 

in 

Ml J 


ACCESSION 


DESCRIPTION 


p 

VALUE 


ACCESSION 


DESCRIPTION 


P 

VALUE 




36 


<NONE> 


<NONE> 


<NONE> 


NFH_RAT 


NEUROFILAMENT 
(200 KD 

NEUROFILAMENT 

PROTEIN) (NF-H) 
(FRAGMENT) 


2.40E- 


p 


37 


<NONE> 


<NONE> 


<NONE> 


HUMVWFM 
1 


Human von Willebrand 
factor mRNA, 3 ! end; 


1.70E- 
05 


"\'f 

£ I 

S| 












Von Willebrand factor 
prepropeptide 




#r r 


38 


<NONB> 


<NONE> 


<NONE> 


CGHU2E 


collagen alpha 2(XI) 
chain - human (fragment) 


2.00E- 
06 


* « *; 

III . 

I- 


39 


<NONE> 


<NONE> 


<NUNc> 


A C 1 1 Ol 


nypoineiicai proiem 
(sdsB region) - 
Pseudomonas sp. 


4 OOF- 1 

08 


s t: 

! a.s 
r •>! 

i: 


40 


<NONE> 


<NONE> 


<NONE> 


YM8L YEAS 
T 


HYPOTHETICAL 71.1 
KD PROTEIN IN DSK2- 
CAT8 INTERGENIC 
REGION>PIR2:S54585 


1.50E- 
09 


III 












hypothetical protein 
YMR278w - yeast 

^OdL-Ul lai uiiij 

cerevisiae)>GP:SC8021 
X_4 S;cerevisiae 
chromosome XIII cosmid 
8021; Unknown; 
YM8021 ;04, unknown, 
len:622, CAI:0;16, 






41 


<NONE> 


<NONE> 


<NONE> 


MTCY210_31 


Mycobacterium 
tuberculosis cosmid 
Y210; Unknown; 
MTCY2 1 0;3 1 , unknown, 
len: 299 aa, slight 
similarity to 
carboxykinases 


3.10E- 
10 
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m 



Nearest 
Neighbor 
(BlastN vs. 

■|I§Genbank) 




;s::f:; : : 



SEQ | ACCESSION 
ID 

42 I <NONE> 



43 



<NONE> 



44 



<NONE> 



DESCRIPTION 



<NONE> 



P 

VALUE 



Nearest 
Neighbor 
(BlastX vs. 
Non- 
Redundant 
Proteins) 
ACCESSION 



DESCRIPTION 



P 

VALUE 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



CEC01G10 5 



Caenorhabditis elegans 
cosmidCOlGlO, 
complete sequence; 
C01G10;8; CDNA EST 
CEMSC45R comes from 
this 

gene>GP:CEC01G10_5 
Caenorhabditis elegans 
cosmid C01G10; 
C01G10;8; CDNA EST 
CEMSC45R comes from 
this gene 



HSU 15779 1 



MTCY210 31 



Human p70 (ST5) 
mRNA, alternatively 
spliced, complete cds; 
Differentially expressed; 
alternatively spliced 



Mycobacterium 
tuberculosis cosmid 
Y210; Unknown; 
MTCY210;31, unknown, 
len: 299 aa, slight 
similarity to 

carboxylases 



2.30E- 
12 



9.50E- 
14 



1.70E- 
17 



45 



U61403 



Dictyostelium 
discoideum PrlA 
(prlA) mRNA, 
partial cds. 



U93472 1 



Danio rerio PPARB 
gene, partial cds; Nuclear 
receptor C domain 



0.95 



46 



Z92832 



Caenorhabditis 
elegans DNA *** 
SEQUENCING 
IN PROGRESS 
*** from clone 
F31D4;HTGS 
phase 1 . 



U93472 1 



Danio rerio PPARB 
gene, partial cds; Nuclear 
receptor C domain 



0.94 



47 



L36557 



Oryza sativa 
(clone pRG3) 
repetitive 
element. 



HSU61262 1 



Human neogenin mRNA, 
complete cds 



0.89 
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RilPifl 


Nearest 
Neighbor 
(BlastN vs. 
Genbank) 


Nearest 
Neighbor 
(BlastX vs. 
Non- 
Redundant 
Proteins) 




SEQ 
ID 


ACCESSION 


DESCRIPTION 


P 

VALUE 


ACCESSION 


DESCRIFIION 


P 

VALUE 


48 1 


AF005898 


Homo sapiens 

Na,K-ATPase 

beta-3 subunit 

pseudogene, 

complete 

sequence. 


1 


LRP1_CHICK 


LOW-DENSITY 

LIPOPROTEIN 

RECEPTOR-RELATED 

PROTEIN 1 

PRECURSOR (LRP) 

(ALPHA-2- 

MACROGLOBULIN 

RECEPTOR) 
(A2MR)>PIR2:A53102 

LDL receptor-related 
protein / alpha-2- 
macroglobulin receptor 

precursor - 

chicken>GP:GGLRPA2 
MR_lG;gallus mRNA 
for LRP/alp 


0.85 


' 49 


U 18795 


Saccharomyces 
cerevisiae 
chromosome V 
cosmids 9669, 
8334, 8199, and 
lambda clone 
1160. 


1 


NKC1 SQUA 
C 


BUMETANIDE- 
SENSITIVE SODIUM- 
(POTASSIUM> 
CHLORIDE 
COTRANSPORTER 2 

(NA-K-CL 

SYMPORTER)>PIR2:A 
53491 bumetanide- 
sensitive Na-K-Cl 
cotransporter - spiny 
dogfish>GP:SANKCC 1_ 
1 Squalus acanthias 
bumetanide-sensitive Na- 
K-Cl cotransport protein 
(NKCC 


0.73 


50 


1 AC002523 


Homo sapiens; 
HTGS phase 1, 
54 unordered 
pieces. 


1 


BXEN CLOB 
0 


BOTULINUM 
NEUROTOXIN TYPE 

E, NONTOXIC 
COMPONENT>GP:CLO 

ENT120_1 C;botulinum 
gene for nontoxic 
component of progenitor 
toxin, complete cds 


0.71 
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Nearest 
Neighbor 
(BlastN vs. 
Genbank) 


Nearest 
Neighbor 
(BlastX vs. 
Non- 
Redundant 
Proteins) 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P 

VALUE 


ACCESSION 


DESCRIPTION 


P 

VALUE 


51 


AC002345 


SEQUENCING 
IN PROGRESS 
*** Genomic 
seauence from 
Human 17; 
HTGS phase I, 
10 unordered 
pieces. 


1 


P3K2J3ICDI 


PHOSPHATIDYLINOSI 
TOL 3-KINASE 2 (EC 
2.7.1.137) (PI3- 
KINASE) (PTDINS-3- 
KINASE) 

(PI3K)>GP:DDU23477_ 
1 Dictyostelium 
discoideum 

phosphatidylinositol-4,5- 
diphosphate 3 -kinase 
(PIK2) mRNA, complete 
cds 


0.58 


52 


XI 4253 


Human mRNA 
for cripto protein. 


1 


155651 


noradrenaline transporter 

bovine>GP:BTU09 198 J 
Bos taurus noradrenaline 
transporter mRNA, 
complete cds 


0.55 


53 


U23516 


Caenorhabditis 
elegans cosmid 
B0416. 


1 


169024 


MHC sex-limited protein 
- mouse 

(fragment)>GP:MUSMH 
C4ADJ Mouse class III 
H2-Slp sex-limited 
protein gene 5 exons 1 , 2 
and 3; MHC sex-limited 
protein 


0.47 


54 


AB006698 


Arabidopsis 
thaliana genomic 

DNA, 

chromosome 5, 
PI clone: 
MCL19. 


1 


S81293J 


LI {insertion sequence, 
provirus} [human 
papillomavirus type 6b 
HPV6b, KP4, Genomic 
Mutant, 121 nt]; Authors 
note this reading frame 
results from a 454 bp 
deletion and resulting 


0.25 


55 


K03458 


Human 

immunodeficienc 
y virus type 1, 
isolate Zaire 6, 
vif, tat, rev, env, 
nef genes and 3' 
LTR. 


1 


SI 3383 


hydroxyproline-rich 
glycoprotein - sorghum 


0.24 
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111 
- 


Nearest 
Neighbor 
(BlastN vs. 
Genbank) 


Nearest 

Neighbor 
(BlastX vs. 
Non- 
Redundant 
Proteins) 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P 

VALUE 


ACCESSION 


DESCRIPTION 


P 

VAT TTF 


56 


B26794 


T1016TRTAMU 
Arabidopsis 
thaliana genomic 
clone T1016. 


1 


RK34 PORP 
U 


CHLOROPLAST SOS 

RIBOSOMAL 

PROTEIN 

L34>PIR2:S73111 

ribosomal protein L34 - 

red alga (Porphyra 

purpurea) 

chloroplast>GP:PPU388 
04_4 Porphyra purpurea 
chloroplast genome, 
complete sequence; 50S 
ribosomal protein L34 


0.021 


57 


Z98950 


Human DNA 
sequence *** 
SEQUENCING 
IN PROGRESS 
*** from clone 
507115; HTGS 
phase 1. 


1 


D41132 


collagen-related protein 4 
- Hydra magnipapillata 
(fragment)>PIR2:S2 1 932 
mini-collagen - Hydra 
sp.>GP:HSNCOL4J 
Hydra N-COL 4 mRNA 
for mini-collagen; No 
start codon 


0.02 


58 


U57057 


Human WD 
protein IR10 
mRNA, complete 
cds. 


1 


DMU1 5602 J 


Drosophila melanogaster 
(zeste-white 4) mRNA, 
complete cds; Similar to 
C; elegans B0464;4 gene 
product, Swiss-Prot 
Accession Number 
Q03562 


0.019 


59 


U57057 


Human WD 
protein IR10 
mRNA, complete 
cds. 


1 


CR2_MOUSE 


COMPLEMENT 
RECEPTOR 1 Yrb I 

PRECURSOR (CR2) 
(COMPLEMENT C3D 
RECEPTOR)>PIR2:A43 
526 complement 
C3d/Epstein-Barr virus 
receptor 2 precursor - 
mouse>GP:MUSCR2AA 
_1 Murine complement 
receptor type 2 (CR2) 
mRNA, complete cds; 
Complement receptor 
type 


0.0074 
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iiBlli 


Nearest 
Neighbor 
(BlastN vs. 
Genbank) 


Nearest 
Neighbor 
(BlastX vs. 
Non- 
Redundant 
Proteins) 




SEQ 
TD 

JlmJ 


ACCESSION 


DESCRIPTION 


r 

VALUE 


ACCiLoMUIN 


TIF QPP TPTIflN 


P 

VALUE 


60 1 


B65337 


CIT-HSP- 
2021H21.TF 
CIT-HSP Homo 
sapiens genomic 
clone 2021H21. 


1 


A38096 


perlecan precursor - 
human>GP:HUMHSPG2 
B_l Human heparan 
sulfate proteoglycan 
(HSPG2) mRNA, 
complete cds 


0.0051 


61 


U84722 


Human vascular 
endothelial 
cadherin mRNA, 
complete cds. 


1 


HSTAFII13J 


H;sapiens mRNA for 
TAFII135;Subunit of 
RNA polymerase II 
transcription factor 
TFIID 


0.0012 


62 


| L41493 


Avian rotavirus 
(strain turkey 1) 
genomic segment 
4 outer capsid 
protein (VP8*) 
gene. 


1 


Y328 MYCP 
N 


HYPOTHETICAL 
PROTEIN MG328 
HOMOLOG>PIR2 : S736 
93 MG328 homolog 
P01_prfl033- 
Mycoplasma pneumoniae 
(ATCC 29342) 
(SGC3)>GP:MPAE0O00 
35_2 Mycoplasma 
pneumoniae from bases 

(section 35 of 63) of the 
complete genome; 
MG328 homolog, 


0.00015 


63 


D63139 

1 


Aeromonas sp. 
gene for 
chitinase, 
complete and 
partial cds. 


1 


MTCY16B7_3 


Mycobacterium 
tuberculosis cosmid 
SCY16B7; Unknown; 
MTCY16B7;03, 
initiation factor, len: 900, 
similar at C-terminal half 
to eg IF2_BACSU 
PI 7889 initiation factor 
if-2(716aa),fasta 


6.3 0E- 
05 
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Nearest 

INclgllDOr 

(BlastN vs. 
Genbank) 






Nearest 
Neighbor 
(BlastX vs. 
Non- 
Redundant 














Proteins) 






SEQ 
1 m 


ACCESSION 


DESCRIPTION 


P 

VALUE 


ACCESSION 


DESCRIPTION 


P 

VALUE 


64 | 


J04974 


Human alpha-2 
type XI collagen 
mRNA 
(COL11A2). 


1 


GDF6 BOVI 
N 


GROWTH/DIFFERENT 
IATION FACTOR GDF- 
6 PRECURSOR 
(CARTILAGE- 
DERIVED 
MORPHOGENETIC 
PROTEIN 2) (CDMP-2) 

5452 cartilage-derived 
morphogenetic protein 2 
precursor - bovine 
(fragment)>GP:BTU136 
61_1 Bos taurus 
cartilage-derived morp 


1.00E- 
05 


65 


AC002394 


Homo sapiens 
Chromosome 16 
BAC clone 
CIT987-SKA- 
21 1C6 -complete 
genomic 
sequence, 
complete 
sequence. 


1 


CELC14F11 
6 


Caenorhabditis elegans 
cosmidC14Fll; Similar 
to aspartate 

aminotransferase; coded 
for by C; elegans cDNA 
CEMSF95FB; coded for 
by C; elegans cDNA 
yk41e4;3; coded for by 

C; elegans 


4.60E- 
06 


66 


AB002312 


Human mRNA 
forKIAA0314 
gene, partial cds. 


i 
i 


1NA 11 I JD/Vo 

T 


1M-TFRMTNAT 

ACETYLTRANSFERAS 
E 1 (EC 2.3.1.88) 
(AMINO-TERMINAL, 

ALPHA- AMINO, 
ACETYLTRANSFERAS 

E 1) 


1 00E- 
09 


67 


AC0O3O85 


Human BAC 
clone RG094H21 
from 7q21-q22, 
complete 
sequence. 


1 


DP19_CAEEL 


DPY-19 

PROTEIN>PIR2:S44629 
f22b7. 10 protein - 
Caenorhabditis 
elegans>GP:CELF22B7_ 
9 Cjaenorhabditis elegans 
(Bristol N2) cosmid 
F22B7; Putative 


4.20E- 
11 
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Nearest 
Neighbor 
(BlastN vs. 
Gen bank) 


Nearest 
Neighbor 
(BlastX vs. 
Non- 
Redundant 
Proteins) 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P 

VALUE 


ACCESSION 


DESCRIPTION 


P 

VALUE 


68 


X55026 


P.anserina 

complete 

mitochondrial 

genome. 


1 


NAT1 YEAS 
T 


N-TERMINAL 
ACETYLTRANSFERAS 
E 1 (EC 2.3.1.88) 
(AMINO-TERMINAL, 
ALPHA- AMINO, 
ACETYLTRANSFERAS 

E 1) 


8.40E- 
12 


69 


Z95399 


Caenorhabditis 
elegansDNA*** 
SEQUENCING 
IN PROGRESS 
*** from clone 
Y39B6; HTGS 
phase 1. 


1 


CER06B9_5 


Caenorhabditis elegans 
cosmid R06B9, complete 
sequence; R06B9;b; 
Protein predicted using 
Genefinder; preliminary 
prediction 


1.50E- 
24 


70 


AC002339 


Arabidopsis 

thaliana 

chromosome II 

BACT11A07 

genomic 

sequence, 

complete 

sequence. 


0.99 


POLG BVDV 
S 


GENOME 

POL YPROTEIN>PIR 1 : 
A44217 genome 
polyprotein - bovine viral 
diarrhea virus (strain SD- 
l)>GP:BVDPOLYPRO_ 
1 Bovine viral diarrhea 
virus polyprotein RNA, 
complete cds; Putative 


1 


71 


Y08559 


B.subtilis urease 
operon and 
downstream 
DNA. 


0.99 


LRP_CAEEL 


LOW-DENSITY 
LIPOPROTEIN 
RECEPTOR-RELATED 
PROTEIN PRECURSOR 
(LRP)>PIR2:A47437 
LDL-receptor-related 
protein - Caenorhabditis 
elegans>GP:CEF29Dl 1_ 
2 Caenorhabditis elegans 
cosmid F29D11, 
complete sequence; 
F29D11;1; Protein 
predicted using Genefi 


l 
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Nearest 
Neighbor 
(BlastN vs. 
Genbank) 


Nearest 
Neighbor 
(BlastX vs. 
Non- 
Redundant 
Proteins) 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P 

VALUE 


ACCESSION 


DESCRIPTION 


P 

VALUE 


72 


U67548 


Methanococcus 
jannaschii from 
bases 986219 to 
996377 (section 
90 of 150) of the 
complete 
genome. 


0.99 


YB60 YEAS 
T 


HYPOTHETICAL 16.3 
KD PROTEIN IN 
DUR1,2-NGR1 
INTERGENIC 
REGION>PIR2:S46084 
probable membrane 
protein YBR210w - yeast 
(Saccharomyces 
cerevisiae)>GP:SCYBR2 
10W_1 S;cerevisiae 
chromosome II reading 
frame ORF YBR210w 


1 


73 


U51645 


Plasmodium 
falciparum 
cytidine 
triphosphate 
synthetase gene, 
complete cds. 


0.99 


HPSVRPLJ 


Sin Nombre virus (NM 
H10) RNAL segment 
encoding RNA 
polymerase (L protein), 
complete cds; Viral RNA 
polymerase (L protein); 
Putative>GP:HPSVRPL 
A 1 Sin Nombre virus 
(NMRll)RNAL 
segment encoding RNA 
polymerase (L protein), 
complete cds; Vir 


0.99 


74 


Z49889 


Caenorhabditis 
elegans cosmid 
T06H11, 
complete 
sequence. 


0.99 


MUSHDPRO 
B_l 


Mouse alternatively 
spliced HD protein 
mRNA, complete cds 


0.02 1 


75 


Z69374 


Human DNA 
sequence from 
cosmid LI 74G8, 
Huntington's 
Disease Region, 
chromosome 
4pl6.3 contains a 
pair of ESTs. 


0.99 


NCPR YEAS 
T 


NADPH- 

CYTOCHROME P450 
REDUCTASE (EC 
1.6.2.4) (CPR) 


0.017 
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Nearest 
Neighbor 
(BlastN vs. 
Genbank) 


Nearest 
Neighbor 
(BlastX vs. 
Non- 
Redundant 
Proteins) 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P 

VALUE 


ACCESSION 


DESCRIPTION 


P 

VALUE 


76 


Z35847 


S.cerevisiae 
chromosome II 
reading frame 
ORF YBL086c. 


0.99 


CYPA CAEE 
L 


PEPTIDYL-PROLYL 

CIS-TRANS 
ISOMERASE10(EC 
5.2.1.8) (PPIASE) 
(ROTAMASE) 
(CYCLOPHILIN- 
10)>GP:CELB0252_4 
Caenorhabditis elegans 
cosmid B0252; Similar to 
nentidvl-nrolvl cis-trans 
isomerase (PPIASE) 
(CYCLOPHILIN)>GP:C 
EU34954_1 
Caenorhabditis el 


0.0044 


77 


L35330 


Rattus norvegicus 
glutathione S- 
transferase Yb3 
subunit eene. 
complete cds. 


0.99 


CELR148J 


Caenorhabditis elegans 
cosmid R148; Contains 
similarity to drosophila 
DNA-binding protein 
K10(NID:g8148); coded 
for by C; elegans cDNA 
ykll8ell;5; coded for by 
C; elegans cDNA 


0.0032 


78 


Y00324 


Chicken 

vitellogenin gene 
3 1 flanking 
region. 


0.99 


A56922 


transcription factor shn - 
fruit fly (Drosophila 
melanogaster) 


0.0023 


79 


M32659 


D.melanogaster 
Shabl 1 protein 
mRNA, complete 
cds. 


0.99 


OMU25146J 


Oncorhynchus mykiss 
recombination activating 
protem 2 gene, partial 
cds 


0.0017 


80 


Z69880 


H.sapiens 
SERCA3 gene 
(partial). 


0.99 


M84D DRO 
ME 


MALE SPECIFIC 
SPERM PROTEIN 
MST84DD>PIR2:S2577 
5 testis-specific protein 
Mst84Dd - fruit fly 
(Drosophila 

melanogaster)>GP:DMM 

ST84D_4 
D;melanogaster 
Mst84Da, Mst84Db, 
Mst84Dc and Mst84Dd 


0.001 1 
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Nearest 
Neighbor 
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Genbank) 


Nearest 
Neighbor 
(BlastX vs. 
Non- 
Redundant 
Proteins) 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P 

VALUE 


ACCESSION 


DESCRIPTION 


P 

VALUE 












genes for put; sperm 
protein 




81 


M99166 


Escherichia coli 
Tip repressor 
binding protein 
(wrbA) gene, 
complete cds. 


0.99 


MTU88962J 


Mycobacterium 
tuberculosis unknown 
protein gene, partial cds 


6.50E- 
07 


82 


X99257 


R.norvegicus 
mRNA for lamin 


0.99 


MIU68729J 


Meloidogyne incognita 
cuticle preprocollagen 

^COi-Z ) lIllvlNjrV, LUIIILHCIC 

cds; Putative 


1.60E- 
09 


83 


AC002432 


Human BAC 
clone RG317G18 
from 7q31, 
complete 
sequence. 


0.98 


1FMDC 


Foot and mouth disease 
virus type c-s8cl, chain 
C - foot and mouth 
disease virus type c-s8cl 
expressed in hamster 
kidney cells 


0.14 


84 


Z34799 


Caenorhabditis 
elegans cosmid 
F34D10, 
complete 
sequence. 


0.98 


MMU57368J 


Mus musculus EGF 
repeat transmembrane 
protein mRNA, complete 
cds; Notch like repeats; 
notch 2 


0.0028 
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Nearest 
Neighbor 
(BlastN vs. 
Genbank) 


Nearest 
Neighbor 
(BlastX vs. 
Non- 
Redundant 
Proteins) 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P 

VALUE 


ACCESSION 


DESCRIPTION 


P 

VALUE 


85 


B 15207 


344E15.TV 
CIT978SKA1 
Homo sapiens 
genomic clone A- 
344E15. 


0.98 


POLG HCVJ 
6 


GENOME 
POLYPROTEIN 
(CONTAINS: CAPSID 
PROTEIN C (CORE 
PROTEIN); MATRIX 
PROTEIN (ENVELOPE 
PROTEIN M); MAJOR 
ENVELOPE PROTEIN 
E; NONSTRUCTURAL 
PROTEINS NS1,NS2, 
NS4A AND NS4B; 
HELICASE (NS3); 
RNA-DIRECTED RNA 
POLYMERASE (EC 
2.7.7.48) (NS5))>PI 


0.00083 


86 


AC002412 


*** 

SEQUENCING 
IN PROGRESS 
*** Human 
Chromosome X; 
HTGS phase 1,2 
unordered pieces. 


0.98 


KDG1 ARAT 
H 


DIACYLGLYCEROL 
KINASE 1 (EC 
2.7.1.107) 
(DIGLYCERIDE 
KINASE) (DGK 1) 

(DAG KINASE 
1)>PIR2:S71467 
diacylglycerol kinase 
(EC 2.7.1.107) ATDGK1 

- Arabidopsis 
thaliana>GP:ATHATDG 
Kl_l Arabidopsis 
thaliana mRNA for 
diacylglycerol kinase, 
complete c 


0.00024 


87 


X57010 


HumanCOL2Al 
gene for collagen 
II alpha 1 chain, 
exons E2-E15. 


0.98 


D80005J 


Human mRNA for 
KIAA0183 gene, partial 
cds 


5.90E- 
10 


88 


M83093 


Neurospora 
crassa cAMP- 
dependent protein 
kinase (cot-1) 
gene, complete 
cds. 


0.98 


YA53 SCHP 
O 


HYPOTHETICAL 24.2 
KD PROTEIN 
C13A11.03IN 
CHROMOSOME 
I>GP:SPAC13A11_3 
S;pombe chromosome I 
cosmid cl3All; 
Unknown; 


3.00E- 
22 



196 



Docket No. 1480P 

Table 2 



j ^ ; 

i ^f* 


Nearest 
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Genbank) 


Nearest 
Neighbor 
(BlastX vs. 
Non- 
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Proteins) 




SEQ 
ID 


ACCESSION 


DESCRIPTION 


P 

VALUE 


ACCESSION 


DESCRIPTION 


P 

VALUE 












SPAC13A11:03, 
unknown, len: 210 




89 


U96271 


Helicobacter 
pylori heat shock 
protein 70 
(hsp70) gene, 
complete cds. 


0.97 


SLMEN6J 


S;latifolia mRNA for 
Men-6 

protein>GP:SLMEN6_l 
S;latifolia mRNA for 
Men-6 protein 


0.43 


90 


U49944 


Caenorhabditis 
elegans cosmid 
C39E6. 


0.97 


RON HUMA 
N 


MACROPHAGE 
STIMULATING 
PROTEIN RECEPTOR 
PRECURSOR (EC 
2.7.1.1 12)>PIR2:I38185 
protem-tyrosine kinase 
(EC 2.7.1.112), receptor 
type ron - 

human>GP:HSRON_l 
H;sapiens RON mRNA 
for tyrosine kinase; 
Putative 


0.034 


91 


Y09255 


Bxereus dnal 
gene, partial. 


0.97 


CELT05C1_5 


Caenorhabditis elegans 
cosmid T05C1; Coded 

-fV. — L t > • alartonc /»T\\| A 

ior oy eieganb cl/in/\ 
yk30f6;3; coded for by 
C; elegans cDNA 
yk34fl0;3 


0.00043 


92 


AC002413 


SEQUENCING 
IN PROGRESS 
*** Human 
Chromosome X; 
HTGS phase 1 , 2 
unordered pieces. 


0.96 


CELC44E4_5 


Caenorhabditis elegans 
cosmid C44E4; Weak 
similarity to the 
drosophila hyperplastic 
disc protein 

(GB:L14644); coded for 
by C; elegans cDNA 
yk49h6;5; coded for by 
C; elegans cDNA 


1 
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Nearest 






Nearest 






1 Pf 


rNeignuor 
(BlastN vs. 
Genbank) 






ll Vigil UUI 

(BlastX vs. 
Non- 
Redundant 






^^^^^^ 








Proteins) 






SEQ 
ID 


ACCESSION 


DESCRIPTION 


P 

VALUE 


ACCESSION 


DESCRIPTION 


p 

VALUE 


93 


U41625 


Caenorhabditis 
elegans cosmid 
K03A1. 


0.96 


HMGC HUM 
AN 


HIGH MOBILITY 
GROUP PROTEIN 
HMGI-C>PIR2:JC2232 
high mobility group I-C 
phosphoprotein - 
human>GP:HSHMGICG 
5_1 Human high- 
mobility group 
phosphoprotein isoform 
I-C (HMGIC) gene, exon 
5>GP:HSHMGICP_1 
H;sapiens mRNA for 
HMGI-C 

protein>GP:HSHMGIC 


1 


94 


Z82202 


Human DNA 
sequence *** 
SEQUENCING 
IN PROGRESS 
*** from clone 

n a T><^ A.J TT'/™ 1 0 

j4P24; HTGS 
phase L 


0.96 


YTH3_CAEE 
L 


HYPOTHETICAL 75.5 
KD PROTEIN CI 4A4.3 
IN CHROMOSOME 
II>GP:CEC14A4_3 
Caenorhabditis elegans 

COSmiQ V^14AH, COuipicie 

sequence; C14A4;3; 
Weak similarity with a B; 
Flavum translocation 
protein (Swiss Prot 
accession number 
P38376) 


0.73 


95 


AL008734 


Human DNA 
sequence *** 
SEQUENCING 
IN PROGRESS 
*** from clone 
324M8; HTGS 
phase 1. 






exiensin precursor ^t/iouc 
1 om L-4 ) - 

tomato>GP:TOMEXTE 
NB_1 L;esculentum 
extensin (class II) gene, 
complete cds 






LI jioo 


Human O 


O OA 


nUMLUL / A 1 


riouio Sapiens ^ciuiica. 


4 60F- 






protein-coupled 

I bw^UlvI iVXllClOV-' 

(GRK5) mRNA, 
complete cds. 




X_l 


CW52-2, CW27-6, 
CW15-2 CW26-5, 11- 
67) collagen type VII 
intergenic region and 
(COL7Al)gene, 
complete cds 


06 


97 


X97384 


A.thaliana atran3 
gene. 


0.95 


<NONE> 


<NONE> 


<NONE 
> 
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| : '" 5 


Nearest 






Nearest 








'^'■■ii.::. ™ ; 
■ 


(BlastN vs. 
Genbank) 






Neighbor 
(BlastX vs. 
Non- 
Redundant 








rim 








Proteins) 








SEQ 
ID 


ACCESSION 


DESCRIPTION 


P 

VALUE 


ACCESSION 


DESCRIPTION 


P 

VALUE 




98 


M62505 


Human C5a 
anaphylatoxin 
receptor mRNA, 


0.95 


RIPB_BRYDI 


RIBOSOME- 

TXT A /^T"T ~\ T A HPTX 

INACTIVATING 
PROTEIN BRYODIN 


0.83 


Wi 






complete cds. 






(RRNA N- 
3.2.2.22) 

(FRAGMENT)>PIR2 :S 1 
6491 rRNAN- 




■ «■ 

il 












glycosidase (EC 
3.2.2.22) bryodin-red 
bryony (fragment) 




* — ■»» 

J S i; 
~? i : 


99 


D28778 


Cucumber mosaic 


0.95 


POLS_RUBV 


STRUCTURAL 


0.00037 


; 

&r 

v 

J 9 
ft..? : ; 

* 5 • 
Hi 


- 




virus RNA 1 for 
la, complete 
sequence. 




M 


POLYPROTEIN 
(CONTAINS: 

XTT TOT T?/^/"^ A T">OTT"\ 

NUCLEOCArMD 
PROTEIN C; 
MEMBRANE 
GLYCOPROTEINS El 
AND 




CI 

r# 












E2)>PIR1:GNWVR3 
structural poiy protein - 
















rubella virus (strain 
M33)>GP:TORUB24S_l 
Rubella virus 24S 
subgenomic mRNA for 

Qtri ifti ir?i1 nrnfpins P. 1 F*,2 

ollUvlUlal JJIvl^iHO i—/ 1 5 J_>i» 

and C; 






100 


AFO 16202 


Homo sapiens 
immunoglobulin 

ncuvy lain 

CDR3 gene, 
partial cds. 


0.93 


HSU79716J 


Human reelin (RELN) 
mRNA, complete cds 


1 




101 


Z68303 


Caenorhabditis 

pIpoatiq r*nQtTMrl 

ZK809, complete 
sequence. 


0.93 


HS5HT4SAR_ 
i 

1 


H;sapiens mRNA for 
^prototiin 4-SIA recentor 

(5-HT4SA-R) 


0.87 




102 


X03049 


E. coli DNA 
sequene 5' to 
origin of 
replication oriC. 


0.93 


S37594 


mucin - human 
(fragment) 


0.0019 
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: 

. ■■y\y-_ ■■ ■ 

■ li^-'l 


Nearest 
Neighbor 
(BlastN vs. 
Genbank) 


Nearest 
Neighbor 
(BlastX vs. 
Non- 
Redundant 
Proteins) 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P 

VALUE 


ACCESSION 


DESCRIPTION 


P 

VALUE 


1 ftO 


M32659 


D.melanogaster 
Shabll protein 
mRNA, complete 
cds. 


0.93 


S38480 


nonstructural protein - 
rubella 

virus>GP:RVM33NP_l 
Rubella virus M33 RNA 
for a nonstructural 
protein; Nonstructural 
protein genes 


2.30E- 
06 


104 

• 


D88687 


Human mRNA 
for KM- 102- 
derived 
reductase-like 
factor, complete 
cds. 


0.93 


BAT3 HUMA 
N 


LARGE PROLINE- 
RICH PROTEIN BAT3 
(HLA-B-ASSOCIATED 
TRANSCRIPT 
3)>PIR2:A35098 MHC 
class III 

i USLUCUllipaii u liny 

antigen HLA-B- 
associated transcript 3 - 
human>GP:HUMBAT3 

A_l Human HLA-B- 
associated transcript 3 
(BAT3) mRNA, 
complete 

cds>GP:HUMBAT3 


8.70E- 
07 


105 


D 16847 


Mouse mRNA for 
stromal cell 
derived protein- 1, 
complete cds. 


0.93 


S52796 


prpL2 protein - human 
(fragment)>GP:HSPRPL 
2_1 H;sapiens mRNA for 
PRPL-2 protein 


3.20E- 
08 


106 


nnr\n 1 c 

D90915 


Synechocystis sp. 

PCC6803 

complete 

genome, 17/27, 

2137259- 

2267259. 




vci/n viz? a c 

YbK9 YbAb 
T 

, . 


^J"^/"^>i m i ^ ^T_rc^^T^ , A T o 

HYrUIHfcllCAL oi.y 
KD PROTEIN IN AFG3- 
SEB2 INTERGENIC 
REGION>PIR2:S50477 
hypothetical protein 
YER019w- yeast 
(Saccharomyces 
cerevisiae)>GP:SCE9537 
_20 Saccharomyces 
cerevisiae chromosome 

Vcosmids 9537, 9581, 
9495, 9867, and lambda 
clone 5898 


^ one 
05 
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1 

| ■:«v ; ;'i- : ::-E- : :-. 



liilil 



SEQ 
ID 



Nearest 
Neighbor 
(BlastN vs. 
Genbank) 



ACCESSION 



DESCRIPTION 



P 

VALUE 



Nearest 
Neighbor 
(BlastX vs. 
Non- 
Redundant 
Proteins) 



107 



AJ001101 



108 



X57108 



Mus musculus 
mRNA for 
gClqBPgene. 



Human gene for 
cerebroside 
sulfate activator 
protein, exons 10- 
14. 



0.92 



0.92 



ACCESSION 



DMU58282 1 



DESCRIPTION 



Drosophila melanogaster 
Bowel (bowl) mRNA, 
complete cds; 
Transcription factor; 
C2H2 zinc finger protein; 
zinc fingers have 
extensive sequence 
similarity to Drosophila 
odd-skipped 



S69032 



P 

VALUE 

JJOE- 
05 



hypothetical protein 
YPR1 44c -yeast 
(Saccharomyces 
cerevisiae)>GP:YSCP96 
59_17 Saccharomyces 
cerevisiae chromosome 
XVI cosmid 9659; 
Yprl44cp; Weak 
similarity near C- 
terminus to RNA 
Polymerase beta subunit 
(Swiss Prot; accession 
number PI 1213) 



4.30E- 
21 



109 



D14635 



Caenorhabditis 
elegans DNA for 
EMB-5. 



0.91 



YM13_YEAS 
T 



PUTATIVE ATP- 
DEPENDENT RNA 
HELICASE 

YMR128W>PIR2:S5305 
8 probable membrane 
protein YMR128w- 
yeast (Saccharomyces 
cerevisiae)>GP:SC9553_ 
4 S;cerevisiae 
chromosome XIII cosmid 
9553; Unknown; 
YM9553;04, probable 
ATP-dependent RNA 
helicase, len: 



0.69 



110 



B55500 



CIT-HSP- 
387J2.TFB CIT- 
HSP Homo 
sapiens genomic 
clone 387J2. 



0.91 



U97553 79 



Murine herpesvirus 68 
strain WUMS, complete 
genome; Unknown 



0.00016 
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■■ 


Nearest 






Nearest 








: :: ;...:<: ; : ' • 
.- : ; . "• ~<-.'~-'~ L -''' 

MS 
■ : - 

■ 


IV AirrH Hat* 

(BlastN vs. 
Genbank) 






NfM<rh hot" 

(BlastX vs. 
Non- 
Redundant 
Proteins) 








SEQ 
ID 


ACCESSION 


DESCRIPTION 


P 

VALUE 


ACCESSION 


DESCRIPTION 


P 

VALUE 


£3 

p% 

535? ^ 

■ 


111 


X03049 


E. coli DNA 
sequene 5' to 
origin of 
replication oriC. 


0.9 


POLMLVAV 


POL POLYPROTEIN 
(PROTEASE (EC 
3.4.23.-); REVERSE 
TRANSCRIPTASE (EC 

Z. /. / Ay), 

RIBONUCLEASE H 
(EC 

3.1.26.4))>PIR1:GNMV 
GV pol polyprotein - 

AI£V mnrinp Ipukpmia 

/AJX. V 111141 lilt IL^UIYV-l illd 
• 

virus 


0.0019 


-» -t* 

H i! 

3 


1 12 


U91327 


Human 
chromosome 
izpij r>AC clone 
ci 1 yo /z>Js.-yyuo 


0.89 


JC55oo 


serine protease (EC 3.4.- 
.-) hi - Serratia 
marcescens 


i 
i 


n 






complete 










n| 

5 ta* 






sequence. 












113 


X13295 


Rat mRNA for 


0.89 


MNGPOLY_l 


Mengo virus polyprotein 


i 


* 

PI ■ 






alpna-zu 
globulin-related 






genome, compieie cas 
withe repeats 




J»« :£ 

lit 






protein. 












114 


Z78415 


Caenorhabditis 
elegans cosmid 
C17G1, complete 
sequence. 


0.89 


AB000121J 


Mouse mRNA for 
TBPIP, complete cds; 
TBP1 interacting protein 


0.39 




115 


AC002308 


*** 

SEQUENCING 
IN PROGRESS 

riuman 
Chromosome 
22qll BAC 
Clone 1000e4; 
HTGS phase 1, 

96 iinordprpfl 

4*\J Ullvl UvJ 

pieces. 


0.88 


YLK2 CAEE 
L 


HYPOTHETICAL 122.7 
KD PROTEIN D1044.2 
IN CHROMOSOME 

TTT>f*;P'PFT HI Odd A 

Caenorhabditis elegans 
cosmid D1044 


0.0037 




116 


AC002073 


Human PAC 
clone DJ515N1 

from 22q 11.2- 
q22, complete 
sequence. 


0.88 


S28499 


probable finger protein - 
rat>GP:RNZFPJ 

R;norvegicus mRNA for 
putative zinc finger 
protein 


1.10E- 
31 
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S =E "If : 

Spll 


Nearest 
Neighbor 
(BlastN vs. 
Genbank) 


Nearest 
Neighbor 
(BlastX vs. 
Non- 
Redundant 
Proteins) 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P 

VALUE 


ACCESSION 


DESCRIPTION 


P 

VALUE 


117 


Z83848 


Human DNA 
sequence *** 
SEQUENCING 
IN PROGRESS 
*** from clone 
57A13; HTGS 
phase 1. 


0.87 


NDL DROM 
E 


SERINE PROTEASE 
NUDEL PRECURSOR 
(EC 3.4.21.- 
)>PIR2:A57096 nudel 
protein precursor - fruit 
fly (Drosophila 
melanogaster)>GP:DMU 
29153J Drosophila 
melanogaster nudel (ndl) 
mRNA, complete cds; 
Serine protease; Soma 
dependent gene required 
matern 


i 


118 


U23449 


Caenorhabditis 
elegans cosmid 
K06A1. 


0.87 


AF023268_3 


Homo sapiens clk2 
kinase (CLK2), propinl, 
cotel, glucocerebrosidase 
(GBA), and metaxin 
genes, complete cds; 
metaxin pseudogene and 
glucocerebrosidase 

Dseudoffene: and 
thrombospondin3 

(THBS3) 


0.21 


119 


Z68181 


H.vulgaris 
mRNA for 
elongation factor 
EFl -alpha. 


0.87 


RABCY450C 
J 


Rabbit cytochrome P-450 
gene, clone pP-450PBc3, 
3* end 


0.14 


120 


AC000033 


Homo sapiens 
chromosome 9, 
complete 
sequence. 


0.87 


VWF CANF 
A 


VON WILLEBRAND 
FACTOR 

PRECURSOR>GP:DOG 
VWG_1 Canis familiaris 
von Willebrand factor 
mRNA, complete cds 


0.036 


121 


U23449 


Caenorhabditis 
elegans cosmid 
K06A1. 


0.86 


S48988J 


CRP- 1 =cystatin-related 
protein [rats, Wistar 
albino, mRNA Partial, 
213 nt]; Cystatin-related 
protein; Method: 
conceptual translation 
supplied by author; This 
sequence comes from 
Fig; 


0.64 
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: 

> ■:■;:;>;:■: fS:.:;:: 
:;::^:;:VS' :::v.r■;;:::>:;^ 


Nearest 
Neighbor 
(BlastN vs. 
Genbank) 


Nearest 
Neighbor 
(BlastX vs. 
Non- 
Redundant 
Proteins) 


SEQ 

ro 


ACCESSION 


DESCRIPTION 


P 

VALUE 


ACCESSION 


DESCRIPTION 


P 

VALUE 


122 


Z89651 


F.rubripes GSS 
sequence, clone 
090I24cD5. 


0.86 


CPU65981J 


Cryptosporidium parvum 
P-ATPase gene (CppA- 
El) gene, complete cds; 
Putative calcium-ATPase 


0.6 


123 


Z94055 


Human DNA 
sequence from 
PAC 24M15on 
chromosome 1. 
Contains 
tenascin-R 
(restricting EST. 


0.86 


GLTB SYNY 
3 


FERREDOXIN- 
DEPENDENT 
GLUTAMATE 
SYNTHASE 1 (EC 

1.4.7.1) (FD- 
GOGAT)>PIR2:S60228 

glutamate synthase 
(ferredoxin) (EC 1.4.7.1) 
gltB - Synechocystis sp. 
(PCC 

6803)>GP:D90902_66 
Synechocystis sp; 
PCC6803 complete 
genome, 4/27, 402290- 
524345; Gluta 


0.03 


1 O /I 

1/4 


Z49250 


Human DNA 
sequence from 
cosmid HW2, 

T T J.' J. 1 

Huntingtons 
Disease Region, 
chromosome 
4pl6.3. 


0.86 


TRSCAPSID 
1 


Tobacco ringspot virus 
capsid protein gene, 
complete cds 


3.00E- 
06 


125 


Z92855 


Caenorhabditis 
elegans DNA *** 
SEQUENCING 
IN PROGRESS 
*** from clone 
Y48C3; HTGS 
phase 1. 


0.84 


AE000809_8 


Methanobacterium 
thermoautotrophicum 
from bases 161632 to 
172569 (section 15 of 
148) of the complete 
genome; Aspartyl- tRNA 
synthetase; Function 
Code:10;07 - Metabolism 
of 


1 
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- 

: : *: :j: : : ::: : :- : : : : : :: : :'' : :: : >::' .■.■V 
. . ; , .. . . 

* 

- 


Nearest 
Neighbor 
(BlastN vs. 
Genbank) 


[Nearest 
Neighbor 
(BlastX vs. 
Non- 
Redundant 
Proteins) 


SEQ 
ID 


ACCESSION 


TTYT7' O /~1"T* YTWT/^TVT 

DESCRIPTION 


P 

VALUE 


ACCESSION 


DESCRIPTION 


P 

VALUE 


126 


AC002340 


*** 

SEQUENCING 
IN PROGRESS 
*** Arabidopsis 
thaliana TAMU' 
BAC T11J7 
genomic 
sequence near 
marker r m283'; 
HTGS phase 1,2 
unordered pieces. 


0.83 


CET01E8_3 


Caenorhabditis elegans 
cosmid T01E8, complete 
sequence; T01E8;3; 
Similar to 1- 
phosphatidylinositol-4,5- 
bisphosphate 
phosphodiesterase; 
cDNA EST CEESG02F 
comes from this gene; 


0.86 


127 


AL008716 


Human DNA 
sequence *** 
SEQUENCING 
IN PROGRESS 
*** from clone 
206C7; HTGS 
phase 1. 


0.83 


HIVU51189_5 


HIV-1 clone 93th253 
from Thailand, complete 
genome; Tat protein 


0.86 


128 


AC002340 


SEQUENCING 
IN PROGRESS 
*** Arabidopsis 
thaliana TAMU' 
BAC Til JT 
genomic 
sequence near 
marker ^283'; 
HTGS phase 1,2 
unordered pieces. 


0.83 


S60257 


meltrin alpha - 
mouse>GP:MUSMAB_l 
Mouse mRNA for 
meltrin alpha, complete 
cds 


0.0013 
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■ ■ ; 
<- 

■ 

: : 

. . ' : : • , : ,; : • 


Nearest 
Neighbor 
(BlastN vs. 
Genbank) 


Nearest 
Neighbor 
(BlastX vs. 
Non- 
Redundant 
Proteins) 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P 

VALUE 


ACCESSION 


DESCRIPTION 


P 

VALUE 


129 


Z83848 


Human DNA 
sequence *** 
SEQUENCING 
IN PROGRESS 
*** from clone 
57A13;HTGS 
phase 1. 


0.82 


AROl PNEC 
A 


PENTAFUNCTIONAL 
AROM POLYPEPTIDE 
(CONTAINS: 3- 
DEHYDROQUINATE 

SYNTHASE (EC 
4.6.1. 3), 3- 

DEHYDROQUINATE 
DEHYDRATASE (EC 
4.2.1.10) (3- 
DEHYDROQUINASE), 
SHIKIMATE 5- 
DEHYDROGENASE 
(EC 1.1.1.25), 
SHIKIMATE KINASE 
(EC 2.7.1.71), AND 
EPSP SYNTHASE (E 


0.0098 


130 


AF029308 


Homo sapiens 
chromosome 9 
duplication of the 
T cell receptor 
beta locus and 
trypsinogen gene 
families. 


0.8 


CELZK84.5 


Caenorhabditis elegans 
cosmid ZK84; Final exon 
in repeat region; similar 
to long tandem repeat 
region of sialidase 
(SP:TCNA TRYCR, 
P23253) and 
neurofilament H protein; 
coded for by C; elegans 


2.00E- 
08 


131 


AC002458 


Human BAC 
clone RG098M04 
from 7q2l-q22, 
complete 
sequence. 


0.78 


IGF2_PIG 


INSULIN-LIKE 
GROWTH FACTOR II 
PRECURSOR (IGF- 
II)>GP:SSIGF2 1 
S;scrofa mRNA IGF2 for 
insulin-like-growth factor 
2; Insulin- like-growth 
factor 2 preproprotein 


0.44 


1 jZ 


Z83843 


Human DNA 
sequence *** 
SEQUENCING 
IN PROGRESS 
*** from clone 
368A4; HTGS 
phase l. 


0.78 


PAR5lA_l 


P;tetraurelia51A surface 
protein gene, complete 
cds 


0.0014 
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I 


Nearest 
Neighbor 
(BlastN vs. 
Genbank) 


Nearest 

Neighbor 
(BlastX vs. 
Non- 
Redundant 
Proteins) 


SEQ 


ACCESSION 


DESCRIPTION 


P 

VALUE 


ACCESSION 


DESCRIPTION 


P 

VALUE 


133 


X03021 


Human gene for 
granulocyte- 
macrophage 
colony 

stimulating factor 
(GM-CSF). 


0.78 


CEF57B1J 


Caenorhabditis elegans 
cosmid F57B1, complete 
sequence; F57B1;3; 
Protein predicted using 
Genefinder; similar to 
collagen 


2.20E- 
05 


134 


Z74825 


S.cerevisiae 
chromosome XV 
reading frame 
ORF YOL083W. 


0.77 


SYLM SCHP 
O 


PUTATIVE LEUCYL- 
TRNA SYNTHETASE, 
MITOCHONDRIAL 
PRECURSOR (EC 
6.1.1.4) (LEUCINE- 
TRNA 

LIGASE)>PIR2:S62486 
hypothetical protein 
SPAC4G8.09 - fission 
yeast 

(Schizosaccharomyces 
pombe)>GP:SPAC4G8_ 
9 S;pombe chromosome I 
cosmid c4G8; Unknown; 
SPAC 


0.96 


135 


Z74825 


S.cerevisiae 
chromosome XV 
reading frame 
ORF YOL0S3w. 


0.77 


RNU59809J 


Rattus norvegicus 
mannose 6- 
phosphate/insulin-like 
growth factor II receptor 
(M6P/IGF2r) mRNA, 
complete cds; Also 
termed IGF-II/Man 6-P 
receptor, MPR, CI-MPR 


0.01 


136 


T TOA A il F 

U80445 


Caenorhabditis 
elegans cosmid 
C50F2. 


0.76 


S28499 


probable finger protein - 
rat>GP:RNZFP_l 
R;norvegicus mRNA for 
putative zinc finger 
protein 


31 


137 


Z78545 


Caenorhabditis 
elegans cosmid 
M03B6, complete 
sequence. 


0.75 


RRU73586J 


Rattus norvegicus 
Fanconi anemia group C 
mRNA, complete cds; 
Fanconi anemia group C 
protein; Similar to human 
FAC protein, GenBank 
Accession Numbers 
X66893 and X66894 


0.023 
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^^^^ 


Nearest 
Neighbor 
(BlastN vs. 
Genbank) 


Nearest 
Neighbor 
(BlastX vs. 
Non- 

Redundant 
Proteins) 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P 

VALUE 


ACCESSION 


DESCRIPTION 


P 

VALUE 


138 


Z97630 


Human DNA 
sequence *** 
SEQUENCING 
IN PROGRESS 
*** from clone 
466N1;HTGS 
phase 1. 


0.74 


HSMSHREC 
A_l 


H;sapiens mRNA for 
MSH receptor; Author- 
given protein sequence is 
in conflict with the 
conceptual translation 


0.036 


139 


AF007269 


Arabidopsis 
thaliana BAC 
IG002N01. 


0.71 


HSU95O90_l 


Homo sapiens 
chromosome 19 cosmid 
F 19541, complete 
sequence; F 1954M; 
Hypothetical (partial) 
protein similar to proline 
oxidase 


0.16 


140 


AC002393 


Mouse 
BAC284H12 
Chromosome 6, 
complete 
sequence. 


0.7 


RNLTBP2_1 


Rattus norvegicus mRNA 
for LTBP-2 like protein; 
Latent TGF- beta binding 
protein-2 like protein 


4.40E- 
05 


141 


B 15232 


344G8.TV 
CIT978SKA1 
Homo sapiens 
genomic clone A- 
344G08. 


0.67 


DMSEVL2_2 


Drosophila melanogaster 
sevenless mRNA; Put; 
sevenless protein (AA 1 - 
2510) 


0.41 


142 


D 13748 


Human mRNA 
for eukaryotic 
initiation factor 
4AI. 


0.66 


MMU53563J 


Mus musculus Brgl 
mRNA, partial cds; N- 
terminal region of the 
protein 


0.00016 


143 


S45791 


band 3-related 
protein=renal 
anion exchanger 
AE2 homolog 
[rabbits, New 
Zealand White, 
ileal epithelial 
cells, mRNA, 
3964 nt]. 


0.66 


POLS_RUBV 
R 


STRUCTURAL 

POLYPROTEIN 

(CONTAINS: 

NUCLEOCAPSID 

PROTEIN C; 

MEMBRANE 

GLYCOPROTEINS El 

AND 

E2)>PIR1:GNWVRA 
structural polyprotein - 
rubella virus (strain 
RA27/3 

vaccine)>GP:RUBCE2 1_ 
1 Rubella virus RA27/3 


5.60E- 
05 
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I 
1 

■illliPl 


Nearest 
Neighbor 
(BlastN vs. 
Genbank) 


Nearest 
Neighbor 
(BlastX vs. 
Non- 
Redundant 
Proteins) 


SEQ 
ID | 


ACCESSION 


DESCRIPTION 


P 

VALUE 


ACCESSION 


DESCRIPTION 


P 

VALUE 












RNA for capsid, E2 and 
El proteins; Poly 




144 


M22462 


Chicken protein 
p54(ets-l) 
mRNA, complete 
cds. 


0.66 


HSHP8PROT 
_1 


H;sapiens mRNA for 
HP8 protein; HP8 
peptide 


2.00E- 
06 


145 


U27999 


Human clone 
pDEL52All 
HLA-C region 
cosmid 52 

* 

genomic survey 
sequence. 

r 


0.65 


CA18 HUMA 
N 


COLLAGEN ALPHA 
l(VIII) CHAIN 
PRECURSOR 
(ENDOTHELIAL 

435 collagen alpha 
l(VIII) chain precursor - 
human>GP:HSCOL8Al_ 
1 HumanCOL8Al 
mRNA for alpha l(VIII) 
collagen 


5.70E- 
06 


146 


1 M54787 


Nxrassa mating 
type a- 1 protein 
(mt a- 1) gene, 
exons 1-3. 


0.64 


150717 


vacuolar H+-ATPase A 
subunit - chicken 
(fragment)>GP:GGU220 
78_1 Gallus gallus 
vacuolar H+-ATPase A 
subunit gene, partial cds 


0.0046 


147 


AC002094 


Genomic 
sequence from 
Human 17, 
complete 
sequence. 


0.63 


PVPVA1J 


P;vivax pval gene 


0.1 
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sir 


Nearest 
Neighbor 
(BlastN vs. 
Genbank) 


Nearest 
Neighbor 
(BlastX vs. 
Non- 
Redundant 
Proteins) 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P 

VALUE 


ACCESSION 


DESCRIPTION 


P 1 
VALUE 


148 


U32701 


Haemophilus 
influenzae from 
bases 165345 to 
176101 (section 
16 of 163) of the 
complete 
genome. 


0.63 


FABG HAEI 
N 


3-OXOACYL-[ACYL- 
CARRiER PROTEIN] 
REDUCTASE (EC 
1.1.1.100) (3- 
KETOACYL- ACYL 
CARRIER PROTEIN 
REDUCTASE)>PIR2:D6 
4051 3-oxoacyl-[acyl- 
carrier-protein] reductase 
(EC 1.1. 1.100) - 
Haemophilus influenzae 
(strain Rd 

KW20)>GP:HIU32701_ 
7 Haemophilus 


2.00E- 
12 


149 


Z37159 


T.brucei serum 
resistance 
associated (SRA) 
mRNA for VSG- 
like protein. 


0.61 


<NONE> 


<NONE> 


<NONE 

> 


150 


AF027865 


Mus musculus 
Major 

Histocompatibilit 
y Locus class 11 
region. 


0.61 


A56514 


chromokinesin - 
chicken>GP:GGU18309 
_] Gallus gallus 
chromokinesin mRNA, 
complete cds 


0.045 


151 


U40938 


Caenorhabditis 
elegans cosmid 
D1009. 


0.61 


YA53 SCHP 
O 


HYPOTHETICAL 24.2 
KD PROTEIN 
C13A11.03IN 
CHROMOSOME 
I>GP:SPAC13A11__3 
S;pombe chromosome I 
cosmid cl3All; 
Unknown; 
SPAC13A11;03, 
unknown, len: 210 


1.90E- 
24 


152 


116670 


Sequence 1 from 
patent US 
5476781. 


0.59 


CELF21F8J7 


Caenorhabditis elegans 
cosmid F21F8; Similar to 
eukaryotic aspartyl 
proteases 


0.39 
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Nearest 






Nearest 






- * 


Neighbor 
(BlastN vs. 
Genbank) 






(BlastX vs. 
Non- 
Redundant 












Proteins) 






SEQ 
ID 


ACCESSION 


DESCRIPTION 


P 

VALUE 


ACCESSION 


DESCRIPTION 


P 

VALUE 


153 


Z84468 


Human DNA 
sequence *** 
SEQUENCING 
IN PROGRESS 
*** from clone 
299D3;HTGS 
phase 1. 


0.59 


CLG1 YEAS 
T 


CYCLIN-LIKE 

PROTEIN 

CLG1>PIR2:S37607 

cyclin-like protein 

YGL215w- yeast 

(Saccharomyces 

cerevisiae)>GP:SCYGL2 

15WJI S;cerevisiae 
chromosome VII reading 
frame ORF 

YOL2 1 5w>Lrr. Y bCCLU 

1CPRJ Saccharomyces 
cerevisiae cyclin-like 
protein (CLGl)gene 


0.0015 


154 


U00054 


Caenorhabditis 
elegans cosmid 
K07E12. 


0.57 


<NONE> 


<NONE> 


<NONE 


155 


M21207 


Synthetic S V40 T 
antigen mutant 
pseudogene, 3 f 
end. 


0.57 


1CJL2 


cathepsin L (EC 
3.4.22.15) mutant 
(F(78P)L, C25S,T110A, 
E176G,D178G), 
fragment 2 - human 


0.43 


156 


AF020282 


Dictyostelium 
discoideum 
UKilvii gene, 
partial cds. 


0.56 


AC002125_4 


Homo sapiens DNA from 
chromosome 19-cosmid 
rzoyoo, genomic 
sequence, complete 
sequence; F25965_5; 
Hypothetical 35 ;3 kDa 
protein similar to 
GTPase-activating 
proteins and orf3 from 


0.6 
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Nearest 
Neighbor 
(BlastN vs. 
Genbank) 


Nearest 
Neighbor 
(BlastX vs. 
Non- 
Redundant 
Proteins) 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P 

VALUE 


ACCESSION 


DESCRIPTION 


P 

VALUE 


157 


M86352 


Stigmatella 
aurantiaca reverse 
transcriptase (163 
RT) gene, 
complete cds. 


0.56 


AC002398_4 


Human DNA from 
chromosome 19-specific 
cosmid F25965, genomic 
sequence, complete 
sequence; F25965_3; 
Hypothetical 96 kDa 
human protein similar to 
alpha chimaerin; 
Hypothetical 
protein>GP:AC002398_4 
Human DNA from 
chromosome 19-specific 
cosmi 


4.50E- 
06 


158 


AC003101 


*** 

SEQUENCING 
IN PROGRESS 
*** Homo 
sapiens 

chromosome 17, 
clone 

HRPC41C23; 
HTGS phase 1, 
33 unordered 
pieces. 


0.54 


<NONE> 


<NONE> 


<NONE 
> 


159 


B12117 


F5L15-T7 IGF 
Arabidopsis 
thaliana genomic 
clone F5L15. 


0.54 


CEF32H2_5 


Caenorhabditis elegans 
cosmid F32H2, complete 
sequence; F32H2;5; 
Similarity to Chicken 
fatty acid synthase 
(SW:P12276); cDNA 
EST ykl6c2;5 comes 
from this gene; cDNA 
EST ykll3h6;5 comes 


1 


lou 


AE000664 


Mus musculus 
TCR beta locus 
from bases 
250554 to 501917 
(section 2 of 3) of 
the complete 
sequence. 


0.54 


CET01G9_6 


Caenorhabditis elegans 
cosmid T01G9, complete 
sequence; T01G9;4; 
CDNA EST yk29b7;5 
comes from this gene 


0.84 



212 



Table 2 



Docket No. 1480P 





XT .i. 

Nearest 




i 


Nearest 






■ 

■ •Z-'-'&rs 

^| 
ill | 


r^eignuur 
(BlastN vs. 
Genbank) 






(BlastX vs. 
Non- 
Redundant 














Proteins! 










T%Jf fiCXt TPTTOIV 

IAIjOV^ 1x1 1 1 HJll 


p 






P 








VAT TTF 






VALUE 


161 


B12117 


F5L15-T7 IGF 
Arabidopsis 
thaliana genomic 
clone F5L15. 


0.54 


A39718 


nicotinic acetylcholine 
receptor alpha chain - 
marbled electric ray 
(fragments) 


0.27 


162 


Z71261 


Caenorhabditis 


0.5 


KDGE DRO 


EYE-SPECIFIC 


4.60E- 






elegans cosmid 




ME 


DIACYLGLYCEROL 


05 






F21C3, complete 
sequence. 






KINASE (EC 2.7.1.107) 

(RETINAL 
DEGENERATION A 
PROTEIN) 
(DIGLYCERIDE 
KINASE) 

(DGK)>GP:DRODAGK 
_1 Fruit fly mRNA for 
diacylglycerol kinase, 
complete cds 




• 163 


M61831 


Human S- 


0.49 


P2C2 ARAT 


PROTEIN 


5.60E- 






adenosylhomocys 
teine hydrolase 
(AHCY) mRNA, 
complete cds. 




H 


PHOSPHATASE 2C (EC 
3.1.3.16) 

(PP2C)>PIR2:S55457 
phosphoprotein 
phosphatase (bC 
3.1.3.16) 2C- 
Arabidopsis 

thaliana>GP:ATHPP2CA 

1 A rsiHiHrinQtQ thai ifina 
1 r\l aUiUUjJoio Lllaltaiid 

mRNA for protein 
phosphatase 2C 


08 


164 


U42608 


Glycine max 
clathrin heavy 
chain mRNA, 
complete cds. 


0.48 


<NONE> 


<NONE> 


<NONE 
> 
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mis 

X:- : : :V : ' : : : - VX : ' .- : " 


Nearest 
Neighbor 
(BlastN vs. 
Genbank) 


Nearest 
Neighbor 
(BlastX vs. 
Non- 
Redundant 
Proteins) 




SEQ 
ID 


ACCESSION 


DESCRIPTION 


P 

VALUE 


ACCESSION 


DESCRIPTION 


P 

VALUE 


165 


Z93042 


Human DNA 
sequence *** 
SEQUENCING 
IN PROGRESS 
*** from clone 
6B17; HTGS 
phase 1. 


0.47 


PYRD BACS 
U 


DIHYDROOROTATE 
DEHYDROGENASE 

(EC 1.3.3.1) 

(DIHYDROOROTATE 

OXIDASE) 

(DHODEHASE)>PIRl: 
H39845 dihydroorotate 
oxidase (EC 1.3.3.1)- 
Bacillus 

subtilis>GPN:BSUB000 
9_25 Bacillus subtilis 
complete genome 
(section 9 of 21): from 
1598421 to 1807200; 


0.002 


166 


AC000044 


Human 
Chromosome 
22ql3 Cosmid 
Clone p76el0, 
complete 
sequence. 


0.47 


MATK MAR 
PO 


PROBABLE INTRON 
MATURA SE>PIR2 A05 
U34 nypoineticai proicin 
370i - liverwort 
(Marchantia polymorpha) 
chloroplast>GP:CHMPX 
X_21 Liverwort 
Marchantia polymorpha 
chloroplast genome 
DNA- ORF370i 


0.0011 


167 


X51508 


Rabbit mRNA for 
aminopeptidase N 
(partial). 


0.47 


S45361 


LRR47 protein - fruit fly 
(Drosophila 

melanogaster)>GP:DML 
RR47 1 D;melanogaster 
mRNA for LRR47 


5.30E- 
07 


loo 


Z67035 


T T * T"VX T A 

H.sapiens DNA 
segment 
containing (CA) 
repeat; clone 
AFM323yfl; 
single read. 


f\ A C 

0.45 


JQ2246 


zz.jR catnepsm u 
inhibitor protein 
precursor - 

potato>GP:POTCATHD 
_1 Potato cathepsin D 
inhibitor protein mRNA, 
complete cds 


0 79 


169 


Z93042 


Human DNA 
sequence *** 
SEQUENCING 
IN PROGRESS 
*** from clone 
6B17; HTGS 


0.44 


SMU31768J 

* 


Schistosoma mansoni 
elastase gene, 3045 bp 
clone, complete cds 


0.0022 
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-■ 


[Nearest 
Neighbor 
(BlastN vs. 
Genbank) 


Nearest 
Neighbor 
(BlastX vs. 
Non- 
Redundant 
Proteins) 




SEQ 
ID 


ACCESSION 


DESCRIPTION 


P 

VALUE 


ACCESSION 


DESCRIPTION 


P 

VALUE 








phase 1. 










S i; 

fli 

is 


170 


LI 1172 


Plasmodium 
falciparum RNA 
polymerase I 
gene, complete 
cds. 


0.43 


HUMPKD1G0 
8 J 


Homo sapiens polycystic 
kidney disease (PKD1) 
gene, exons 43-46; 
Polycystic kidney disease 
1 protein 


1 


ill 

5 in 

r. •» »• 

*■> H f« 

•*V* I* 
* <y •»* 

J ; i: 
for 

P 

i 

W 


171 


Z95889 


Human DNA 

sequence *** 
SEQUENCING 
IN PROGRESS 
*** from clone 
211A9;HTGS 
phase 1. 


0.43 


A09811J 


R;norvegicus mRNA for 
BRL-3A binding protein; 
Author-given protein 
sequence is in conflict 
with the conceptual 
translation 


0.00083 


* • ■ ' 
* 

111 
L_i 

£ v ^ 
| Til* ■ ■ 


172 


U32772 


Haemophilus 
influenzae from 
bases 9548 19 to 
966363 (section 
87 of 163) of the 

complete 
genome. 


0.43 


YPT2 CAEE 
L 


HYPOTHETICAL 21.6 
KD PROTEIN F37A4.2 
IN CHROMOSOME 
III>PIR2:S44639 
F37A4.2 protein - 

Caenorhabditis 
elegans>GP:CELF37A4_ 

8 Caenorhabditis elegans 
cosmid F37A4 


2.50E- 
28 




173 


Z99281 


Caenorhabditis 
elegans cosmid 
Y57G11C, 
complete 
sequence. 


0.42 


PTU19464J 


Paramecium tetraurelia 
outer arm dynein beta 
heavy chain gene, 
complete cds 


1 
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\ 


. ■ ■ ■■ 

- 


Nearest 






Nearest 






■ 


iNeigiiDor 
(BlastN vs. 
Genbank) 






TV^icrfi Vint* 

(BlastX vs. 
Non- 
Redundant 
Proteins) 








SEQ 

ID 


ACCESSION 


DESCRIPTION 


P 

VAT TT17 


ACCESSION 


DESCRIPTION 


P 

VALUE 


*• 


174 


X04571 


Human mRNA 
for kidney 
epidermal growth 
factor (EGF) 
precursor. 


0.42 


YEK9 YEAS 
T 


HYPOTHETICAL 53.9 
KD PROTEIN IN AFG3- 
SEB2 INTERGENIC 
REGION>PIR2 :S50477 
hypothetical protein 
YER019w- yeast 


0.99 


*'\.T. "( 

": .> i; 
jsata. 

f } 

y is 

; :■! 
fc'f |: 












(^accnaromyces 
cerevisiae)>GP:SCE9537 
_20 Saccharomyces 
cerevisiae chromosome 
V cosmids yjj /,yjo\ 9 
9495, 9867, and lambda 
clone 5898 




l« . 

i: 

g ■ 


175 


U32772 


Haemophilus 
influenzae from 


0.41 


YPT2 CAEE 
L 


HYPOTHETICAL 21.6 
KD PROTEIN F37A4.2 


7.80E- 
21 








bases 954819 to 






TXT r^T TH AT* /T AP ATl 

IN CHROMObUMb 




1 3 ■ «S 

Uf 






966363 (section 






III>PIR2:S44639 




£ 3,. i- 

Ui: 

* "3 

< *i 
*■> — *»*■ 

MB 
it ao: 






87 of 163) of the 

complete 

genome. 






F37A4.2 protein - 
Caenorhabditis 
elegans>GP:CELF37A4_ 
8 Caenorhabditis eiegans 
cosmia rj //v+ 






176 


AC002053 


Human 
Chromosome 
9p22 Cosmid 
Clone 92fS, 
complete 
sequence. 


0.4 


HSU33837J 


Human glycoprotein 
receptor gp330 precursor, 
mRNA, complete cds 


1 




177 


U88309 


Caenorhabditis 
eiegans cosmio 
T23B3. 


0.4 


DROMTTGN 


Drosophila melanogaster 

m i t Ar* Vi An H n 51 1 

cytochrome c oxidase 
subunit I (COI) gene, 5' 
end, Trp-, Cys-, and Tyr- 
tRNA genes, NADH 
dehydrogenase subunit 2 
(ND2) gene, 3' end 


0.99 
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Nearest 
Neighbor 
(BlastN vs. 
Genbank) 


Nearest 
Neighbor 
(BlastX vs. 
Non- 
Redundant 
Proteins) 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P 

VALUE 


ACCESSION 


DESCRIPTION 


P 

VALUE 


178 


M34025 


Human fetal Ig 
heavy chain 
variable region 
(clone M44) 
mRNA, partial 
cds. 


0.39 


DNA2 YEAS 
T 


DNA REPLICATION 
HELICASE 
DNA2>PIR2:S48904 
probable purine 
nucleotide-binding 
protein YHR164c - yeast 
(Saccharomyces 
cerevisiae)>GPN:YSCH9 
986_3 Saccharomyces 
cerevisiae chromosome 
VIII cosmid 9986; 
Dna2p: DNA replication 
helicase; YHR1640GP: 


1 


179 


AC002395 


Homo sapiens; 
HTGS phase l, 
127 unordered 
pieces. 


0.39 


VVMUMPE 


NONSTRUCTURAL 
PROTEIN V 
(NONSTRUCTURAL 
PROTEIN NS1) 


0.11 


180 


AC003101 


#*# 

SEQUENCING 
IN PROGRESS 
*** Homo 
sapiens 

chromosome 17, 
clone 

HRPC41C23; 
HTGS phase 1, 
33 unordered 
pieces. 


0.39 


YLK2 CAEE 
L 


HYPOTHETICAL 122.7 
KD PROTEIN D 1044.2 
IN CHROMOSOME 
III>GP:CELD1044_4 
Caenorhabditis elegans 
cosmid D1044 


0.0001 


181 


Z54335 


Human DNA 
sequence from 
cosmid LI 7A9, 
Huntington's 
Disease Region, 

chromosome 
4pl6.3. Contains 
VNTR and a CpG 
island. 


0.39 


HUMNFAT3 
A_l 


Homo sapiens NF-AT3 
mRNA, complete cds 


1.60E- 
06 



217 



Table 2 



Docket No. 1480P 



• ■•:•:•> : > <l 
;:v. : : ;:•.■'>::,■: 

■Mill 


Nearest 
Neighbor 
(BlastN vs. 
Genbank) 


Nearest 

Neighbor 
(BlastX vs. 
Non- 
Redundant 
Proteins) 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P 

VALUE 


ACCESSION 


DESCRIPTION 


P 

VALUE 




U95743 


Homo sapiens 
cnromosome lo 
BAC clone 
CIT987-SK65D3, 
complete 
sequence. 


0.38 


CEZC434_6 


Caenorhabditis elegans 

cosmiQ za^**o*+, t/Uiiipicic 

sequence; ZC434;6; 
CDNA EST CEESO02F 
comes from this gene; 
cDNA EST CEESS60F 
comes from this gene 


0.18 


183 


AC001229 


Sequence of BAC 

F5I14 from 

Arabidopsis 

thaliana 

chromosome l, 

complete 

sequence. 


0.34 


HSOCAMJ 


H;sapiens mRNA for 
lmmunogioDuun-iiKe 
domain-containing 1 
protein 


0.051 




X01703 


Human gene for 
aipna-tuDuiin {p 
alpha 1). 


0.33 


NTC3_MOUS 


NEUROGENIC LOCUS 

IN \J 1 LxFl 

PROTEIN>PIR2:S45306 

notch 3 protein - 
mouse>GP:MMNOTC_l 
M;musculus mRNA for 
Notch 3 


0.012 


185 


Z82189 


Human DNA 
sequence *** 
SEQUENCING 
IN PROGRESS 
*** from clone 
170A21;HTGS 
phase 1. 


0.31 


LG106_3 


Lemna gibba negatively 
light-regulated mRNA 
(Lgl06); Second longest 
ORF (2) 


0.27 


186 


Z98051 


Human DNA 
sequence *** 
SEQUENCING 
IN PROGRESS 
*** from clone 
501A4; HTGS 
phase 1 . 


0.3 


S34960 


NADH dehydrogenase 
(ubiquinone) (EC 
1.6.5.3) chain 5- 
Crithidia oncopelti 
mitochondrion 
(SGC6)>GP:MICOCNN 
R_3 Crithidia oncopelti 
mitochondrial ND4, 
ND5, COI, 12S 
ribosomal RNA genes for 
NADH dehydrogenase 
subunit 4/5, cytochrome 
oxidase subun 


0.25 
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WSS 
WSm 

mm 


Nearest 
Neighbor 
(BlastN vs. 
Genbank) 


Nearest 
Neighbor 
(BlastX vs. 
Non- 
Redundant 
Proteins) 


P I 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P 

VALUE 


ACCESSION 


DESCRIPTION 


VALUE 


187 I 


Z98749 


Human DNA 
sequence *** 
SEQUENCING 
IN PROGRESS 
*** from clone 
449017; HTGS 
phase 1. 


0.3 


SCKC LEIQ 
H 


CHARYBDOTOXIN 
(CHTX) (CHTX- 
LQ1)>PIR2:A60963 
charybdotoxin 1 - 
scorpion (Leiurus 

quinquestriatus)>3D:2CR 
D Charybdotoxin (nmr, 
12 structures) - scorpion 
(Leiurus quinquestriatus) 


0.12 


188 


X96763 


C.albicans CDC4 
gene. 


0.29 


CECC4J 


Caenorhabditis elegans 
cosmid CC4, complete 
sequence; CC4;a; Protein 
predicted using 
Genefinder; preliminary 
prediction 


1.30E- 
17 


189 


U38804 


Porphyra 

purpurea 

chloroplast 

genome, 

complete 

sequence. 


0.28 


HIVHCDR3C 
_1 


Human 

immunodeficiency virus 
type 1 heavy-chain 
complemetarity- 
determining region 3 
mRNA (clone 1 1), partial 
cds; Heavy-chain 
complementarity- 
determining region 3 
(CDR3)fromIIIV 
gpl20- 

>GP:HIVHCDR3IJ 
Human 

immunodeficiency virus 
type i nc 


1 


190 


I U20657 


Human ubiquitin 
protease (Unph) 
proto-oncogene 
mRNA, complete 
cds. 


0.28 


HSU20657J 


Human ubiquitin 
protease (Unph) proto- 
oncogene mRNA, 
complete cds 


5.60E- 
12 


191 

1 V X 


AC002037 


Human 

Chromosome 1 1 

Overlapping 

Cosmids 

cSRL72g7 and 

cSRL140b8, 

complete 


0.27 


VRP1 YEAS 
T 


VERPROLIN>GP:SCVE 
RPRL_l S;cerevisiae 
(A364) gene for 
verprolin 


2.00E- 
11 
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NHK 


Nearest 
Neighbor 
(BlastN vs. 
Genbank) 


Nearest 
Neighbor 
(BlastX vs. 
Non- 
Redundant 
Proteins) 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P 

VALUE 


ACCESSION 


DESCRIPTION 


P 

VALUE 






sequence. 










192 


U58748 


Caenorhabditis 
elegans cosmid 
ZK180. 


0.27 


EXLP TOBA 
C 


PISTIL-SECIFIC 
EXTENSIN-LIKE 
PROTEIN PRECURSOR 
(PELP)>PIR2:JQ1696 
pistil extensin-like 
protein precursor (clone 
pMG15) - common 
tobacco>GP:NTPMGl 5_ 
1 N;tabacum mRNA for 
pistil extensin like 
protein 


4.10E- 

12 


193 


Z68013 


Caenorhabditis 
elegans cosmid 
W02H3, 
complete 
sequence. 


0.26 


<NONE> 


<NONE> 


<NONE 
> 


194 


AF017042 


Dictyostelium 
discoideum LTR- 
retrotransposon 
Skipper, partial 

■ 

genomic 
sequence, 5' end. 


0.26 


SPBC31F10 1 
4 


S;pombe chromosome II 
cosmid c31F10; 
Hypothetical protein; 
SPBC31F10;14c, 
unKnown, len. j jooad, 
some similarity eg; to 
YJR140C, 

YJ9H_YEAST, P47171, 
involved in cell cycle 
regulation 


1 


195 


B03174 


cSRL-16e2-u 
cSRL flow sorted 
Chromosome 11 
specific cosmid 
Homo sapiens 
genomic clone 
cSRL-16e2. 


0.26 


CELC30E1_7 


Caenorhabditis elegans 
cosmid C30E1 


0.38 
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Nearest 
Neighbor 
(BlastN vs. 
Genbank) 


Nearest 
Neighbor 
(BlastX vs. 
Non- 
Redundant 
Proteins) 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P 

VALUE 


ACCESSION 


DESCRIPTION 


P 

VALUE 


196 


X70810 


E.gracilis 
chloroplast 
complete 
genome. 


0.25 


CEK10H10_8 


Caenorhabditis elegans 
cosmid K10H10, 
complete sequence; 
K10H10;k; Protein 
predicted using 
Genefinder; preliminary 
prediction 


0.98 


197 


U80024 


Caenorhabditis 
elegans cosmid 
C18B10. 


0.25 


MMAF001794 
_1 


Mus musculus Treacher 
Collins Syndrome protein 
(Tcofl)mRNA, 

complete cds; Putative 
nucleolar 

phosphoprotein; similar 
to Homo sapiens 
Treacher Collins 
syndrome TCOF1 protein 
encoded>GP:MMAF001 
794_1 Mus musculus 
Treacher Collins 
Syndrome p 


0.017 


198 


AC00O591 


Drosophila 
melanogaster 
(subclone 9 g3 
from PI DS01486 
(D32)) DNA 
sequence, 
complete 
sequence. 


0.25 


YHGE ECOL 
I 


HYPOTHETICAL 64.6 
KD PROTEIN IN 
MRCA-PCKA 
INTERGENIC REGION 
(F574)>PIR2:E65135 
hypothetical 64.6 kD 
protein in mrcA-pckA 
intergenic region - 
Escherichia coli (strain 
K- 

12)>GP:ECAE000415_7 
Escherichia coli , mrcA, 
yrfE, yrfF, yrfG, yrfH, 
yrfl 


0.00068 
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Nearest 
Neighbor 
(BiastN vs. 
Genbank) 


Nearest 
Neighbor 
(BlastX vs. 
Non- 
Redundant 
Proteins) 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P 

VALUE 


ACCESSION 


DESCRIPTION 


P 

VALUE 


199 


AC000591 


Drosophila 
melanogaster 
(subclone 9 g3 
from PI DS01486 
(D32)) DNA 
sequence, 
complete 
sequence. 


0.25 


YHGE ECOL 
I 


HYPOTHETICAL 64.6 
KD PROTEIN IN 
MRCA-PCKA 
INTERGENIC REGION 
(F574)>PIR2:E65135 
hypothetical 64.6 kD 
protein in mrcA-pckA 
intergenic region - 
Escherichia coli (strain 
K- 

12)>GP:ECAE000415_7 
Escherichia coii , mrcA, 
yrfE, yrfF, yrfG, yrfH, 
yrfl 


0.00068 














200 


Z99571 


Human DNA 
sequence *** 
SEQUENCING 
IN PROGRESS 
*** from clone 
388N15;HTGS 
phase 1 . 


0.24 


YA53 SCHP 
O 


HYPOTHETICAL 24.2 
KD PROTEIN 
C13A11.03IN 
CHROMOSOME 
I>GP:SPAC13A11_3 
S;pombe chromosome I 
cosmid cl3Al 1; 
Unknown; 
SPAC13A11;03, 
unknown, len: 210 


0.017 


201 


U00672 


Human 
interleukin-10 
receptor mRNA, 
complete cds. 


0.24 


TFDP00900 


- Polypeptides entry for 
factor Oct-2.5 


1 .00E- 
05 


202 


AC003061 


IN PROGRESS 
*** Mouse 
Chromosome 6 
BAC clone 
b245cl2; HTGS 
phase 2, 8 
ordered pieces. 


0.23 


CG1_HUMA 

IN 


CGI 

3_1 Human Xq28 
mRNA, complete cds; 
Orf 


0.00078 


203 


AF009420 


Homo sapiens 
microsatellite 
sequence in the 
HNF3a gene. 


0.22 


PN0675 


collagen alpha 1 (XVIII) 
chain - mouse 
(fragment)>GP:MUSCO 
LLAG_1 Mouse mRNA 
for collagen, partial cds 


0.00072 
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I Nearest 
(Neighbor 



J|ii|(BlastN vs. 
Genbank) 



Nearest 
Neighbor 
(BlastX vs. 
Non- 
Redundant 
Proteins) 



SEQ |ACCESSION 
ID 

204~1 B18861 



205 



U00672 



206 



X52105 



207 



L07628 



208 



Z4963 1 



209 



Z87893 



DESCRIPTION 

F20C18-Sp6 IGF 
Arabidopsis 
thaliana genomic 
clone F20C18. 



Human 
interleukin-10 
receptor mRNA, 
complete cds. 



Dictyostelium 
discoideum SP60 
gene for spore 
coat protein. 



Saccharopolyspor 
a erythraea 
insertion 

sequence IS1 136, 
copy B, 3' end. 



S.cerevisiae 
chromosome X 
reading frame 
ORF YJR131W. 



F.rubripes GSS 
sequence, clone 
043C17aB8. 



P 

VALUE 

022 



0.22 



ACCESSION 



TFDP00659 



0.18 



TFDP00900 



<NONE> 



0.17 



0.16 



0.16 



D88764 1 



YSCDAL1A 
1 



CELC27A12 
8 



DESCRIPTION 

- Polypeptides entry for 
factor PR 



- Polypeptides entry for 
factor Oct-2.5 



<NONE> 



Rana catesbeiana mRNA 
for alpha 2 type I 
collagen, complete cds 



Saccharomyces 
cerevisiae alantoinase 
(DAL1) gene, complete 
cds 



Caenorhabditis elegans 
cosmid C27A 12; Partial 
CDS; this gene begins in 
the neighboring clone; 
coded for by C; elegans 
cDNA ykl27fl;3; coded 
for by C; elegans cDNA 
yk!27fl;5 



P 

VALUE 

(X0003 



1.00E- 
05 



<NONE 

> 



0.00021 



1.30E- 
07 



210 



U92852 



Rhoiptelea 
chiliantha 
maturase (matK) 
gene, chloroplast 
gene encoding 
chloroplast 
protein, complete 
cds. 



0.15 



SEU40259 5 



Staphyloccous 
epidermidis trimethoprim 
resistance plasmid 
pSK639; Orf53 



0.95 
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■ 

.HIS 


Nearest 
Neighbor 
(BlastN vs. 
Genbank) 


Nearest 
Neighbor 
(BlastX vs. 
Non- 
Redundant 
Proteins) 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P 

VALUE 


ACCESSION 


DESCRIPTION 


P 

VALUE 


211 


X62620 


B.mori Abd-A 
gene homeobox. 


0.15 


ATAP22J6 


Arabidopsis thaliana 
DNA chromosome 4, 
ESSA 1 AP2 contig 
fragment No; 2; 
Hypothetical protein; 
Similarity to NADH 
dehydrogenase, 
Chondrus crispus; 


0.75 


212 


J02079 


epstein-barr virus 
simple repeat 
array (ir3). 


0.15 


A38346 


ultra-high-sulfur keratin 
1 - 

mouse>GP:MUSSERl_l 
Mouse serine 1 ultra high 
sulfur protein gene, 
complete cds; Putative 


7.50E- 
05 


213 


M35027 


Vaccinia virus, 

complete 

genome. 


0.14 


MTF1 FUSN 
U 


MODIFICATION 
METHYLASE FNUDI 
(EC 2.1.1.73) 
(CYTOSINE-SPECIFIC 
METHYLTRANSFERA 
SE FNUDI) (M.FNUDI) 


0.87 


214 


AC003058 


*f» *fc 

SEQUENCING 
IN PROGRESS 
*** Arabidopsis 
thaliana 'IGF' 
BAC T27F23' 
genomic 
sequence near 
marker 

'cicoeEos 1 ; 

HTGS phase 1,8 
unordered pieces. 


0.14 


HEXAJDICDI 


BETA- 

HEXOSAMINIDASE 
ALPHA CHAIN 
PRECURSOR (EC 
3.2. 1 .52) (N-ACETYL- 
BETA- 

GLUCOSAMINIDASE) 
(BETA-N- 

ACETYLHEXOSAMINI 

DASE)>PIR2:A30766 

beta-N- 

acetylhexosaminidase 
(EC 3.2.1.52) A 
precursor - slime mold 
(Dictyostelium 
discoideum)>GP:DDINA 
GA 1 D;d 


0.006 



224 



Docket No. 1480P 

Table 2 



< ':■ 0 . ' '■ iSi; 

phi 


Nearest 
Neighbor 
(BlastN vs. 
Genbank) 


Nearest 
Neighbor 
(BlastX vs. 
Non- 
Redundant 
Proteins) 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P 

VALUE 


ACCESSION 


DESCRIPTION 


P 

VALUE 


215 


AC001229 


Sequence of BAC 

F5I14 from 

Arabidopsis 

thaliana 

chromosome 1, 

complete 

sequence. 


0.13 


A49281 


pol protein - simian T- 
cell lymphotropic virus 
type 1, STL V-l (isolate 
Bab34) 

(fragment)>GP:STVBAB 
POLA J Simian T-cell 
leukemia virus PCR 
derived (pol) gene, 
partial sequence 
BAB34POL; Bases 
4779-4918 EMBL ATK 
numbering system; 
BAB34POL 


0.77 


216 


U46067 


Capra hircus 
beta-mannosidase 
mRNA, complete 
cds. 


0.12 


S70663 


lectin heavy chain, N- 
acetylgalactosamine- 
specific - Entamoeba 
histolytica 

(fragment)>GP:EHU334 
43_1 Entamoeba 
histolytica GalNAc lectin 
heavy subunit (hgl4) 
gene, partial cds; N- 
acetylgalactosamine 
adherence lectin heavy 
subunit 


0.8 


217 


AC000380 


SEQUENCING 
IN PROGRESS 
*** Human 
Chromosome 3 
pacpDJ70ill; 
HTGS phase 1,2 
unordered pieces. 


0.12 


ATFCA8J9 


Arabidopsis thaliana 
DNA chromosome 4, 
ESS A I contig fragment 
No; 8; Unnamed protein 
product 


0.64 
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■ 

: : : :: ':-::-y :. r r : : >: . : ^;" : : : - : ; : : : : . : - L >: : : : 


Nearest 
Neighbor 
(BlastN vs. 
Genbank) 


Nearest 
Neighbor 
(BlastX vs. 
Non- 
Redundant 
Proteins) 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P 

VALUE 


ACCESSION 


DESCRIPTION 


P 

VALUE 


218 


X61207 


A.brasilense hisB, 

H, A, F and E 

genes for 

imidazole 

glycerolphosphat 

e dehydratase, 

glutamine 

amidotransferase, 

phosphorybosilfo 

rmimino-5- 

amino- 

phosphorybosil- 
4- 

imidazolecarboxa 
mide isomerase, 
cyclase and 
phosphorybosil- 
AMP- 

cyclohydrolase. 


0.12 


OCCL02J 


0;circumcincta colost-2 
gene; Cuticular collagen 


0.0074 


219 


AFO 14259 


HIV-1 Patient 
1088 from 
Edinburgh, MA- 

pi 7 (gag) gene, 
partial cds. 


0.11 


DMU88570J 


Drosophila melanogaster 
CREB-binding protein 
homolog mRNA, 
complete cds; CBP 


1 


220 


AC000636 


Drosophila 
melanogaster 
(subclone 2 cl 1 
from PI DS07660 
(D44)) DNA 
sequence, 
complete 
sequence. 


0.11 


A64829 


hypothetical protein in 
dmsC y region - 
Escherichia coli (strain 
K- 

1 2)>GP:ECAE000 1 92_ l 
Escherichia coli , ycaD, 
ycaK, pflA, pflB, focA 
genes from bases 944908 
to 955952 (section 82 of 
400) of the complete 
genome; Hypothetical 
protein in dmsC 


0.051 


221 


AC002428 


Human BAC 
clone GS039E22 
from 5q31, 
complete 
sequence. 


0.11 


HSNMYC2J 


Human N-myc gene exon 
2; Put; N-myc protein (aa 
1-263) (953 is 1st base in 
codon) 


0.00014 
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Nearest 
Neighbor 
(BlastN vs. 
Geabank) 


Nearest 
Neighbor 
(BlastX vs. 
Non- 
Redundant 
Proteins) 


ID 


ACCESSION 


DESCRIPTION 


p 

VALUE 


ACCESSION 


DESCRIPTION 


P 

VALUE 


222 


L40949 


Homo sapiens 
(clone AT7-5eu) 
opioid-receptor- 
like protein 
mRNA, 5' end. 


0.11 


CEUNC93_2 


C;elegans unc-93 gene; 
Protein 2 


1 .20E- 
13 


223 


AL008636 


Human DNA 
sequence *** 
SEQUENCING 
IN PROGRESS 
*** from clone 
722E9; HTGS 
phase 1. 


0.1 


XELCOL2A1 
A J 


Xenopus laevis alpha- 1 
collagen type IF mRNA, 
complete cds; Alpha- 1 
type II' collagen 


2.60E- 
06 


224 


D86993 


Human (lambda) 
DNA for 
immunoglobulin 
light chain. 


0.1 


CELM02B7J2 


Caenorhabditis elegans 
cosmid M02B7 


1.80E- 
09 


' 225 


AC002539 


Homo sapiens 
chromosome 17, 
clone 195o20, 
complete 
sequence. 


0.098 


MTCY7D11 
17 


Mycobacterium 
tuberculosis cosmid 
Y7D11; Unknown; 
MTCY07Dll;17c; 
unknown, ten: 186 aa, 
FASTA best: Ql 0390 
Y009_MYCTU 
hypothetical 3 1;0 KD 
protein MTCY190;09C 
(299 aa) opt: 355 z-score: 
316;8 


0.026 


226 


M88165 


Human inter- 
alpha-trypsin 
inhibitor light 
chain (ITI) gene, 
exon 1 . 


0.096 


A54161 


ryanodine-binding 
protein alpha form - 
bullfrog>GP:D21070J 
Rana catesbeiana mRNA 
for bullfrog skeletal 
muscle calcium release 
channel (ryanodine 
receptor) alpha 
isoform(RyRl), complete 
cds; Ryanodine receptor 
alpha isoform 


1 
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Nearest 
Neighbor 
(BlastN vs. 
Genbank) 


Nearest 
Neighbor 
(BlastX vs. 
Non- 
Redundant 
Proteins) 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P 

VALUE 


ACCESSION 


DESCRIPTION 


P 

VALUE 


227 


Z92851 


Caenorhabditis 
elegans DNA *** 
SEQUENCING 
IN PROGRESS 
*** from clone 
Y39G8; HTGS 
phase 1. 


0.082 


CYA7 BOVI 
N 


ADENYLATE 
CYCLASE, TYPE VII 
(EC 4.6.1.1) (ATP 
PYROPHOSPHATE- 
LYASE) (ADENYLYL 
CYCLASE) 


0.3 


228 


L00638 


Arabidopsis 
thaliana ubiquitin 
conjugating 
enzyme exons 2- 
4. 


0.072 


NUCM TRY 
BB 


NADH-UBIQUINONE 
OXIDOREDUCTASE 
49 KD SUBUNIT 
HOMOLOG(EC 1.6.5.3) 
(NADH 

DEHYDROGENASE 
SUBUNIT 7 
HOMOLOG)>PIR2 : A3 5 
693 NADH 
dehydrogenase (EC 
1 .6.99.3) chain 7 - 
Trypanosoma brucei 
mitochondrion (SGC6) 


0.24 


229 


U49169 


Dictyostelium 
discoideum V- 
ATPase A 
subunit (vatA) 
mRNA, complete 
cds. 


0.071 


MMU65594J 


Mus musculus Brca2 
mRNA, complete cds; 
Similar to human breast 
cancer susceptibility gene 
BRCA2; Allele: wild 
type; putative tumor 
suppressor 


1 


230 


AF001549 


Homo sapiens 
chromosome 16 
BAC clone 
CIT987SK- 
270G1 complete 
sequence. 


0.07 


PM22 HUMA 
N 


PERIPHERAL MYELIN 
PROTEIN 22 (PMP- 
22)>PIR2:JN0503 
peripheral myelin protein 
22- 

human>GP:HUMGAS3 
X_l Human peripheral 
myelin protein 22 
(GAS3) mRNA, 
complete 

cds>GP:HUMPMP22_l 
Human peripheral myelin 
protein 22 mRNA, 
complete 

cds>GP:HUMPMP22 


0.0078 
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Nearest 
Neighbor 
(BlastN vs. 
Genbank) 


Nearest 
Neighbor 
(BlastX vs. 
Non- 
Redundant 
Proteins) 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P 

VALUE 


ACCESSION 


DESCRIPTION 


P 

VALUE 


231 


L36829 


Mus musculus 
alphaA-crystallin- 
binding protein I 
(AlphaA- 
CRYBPl)gene, 
complete cds. 


0.066 


<NONE> 


<NONE> 


<NONE 
> 


232 


AC000159 


SEQUENCING 
IN PROGRESS 
*** Human B AC 
Clone Hql3; 
HTGS phase 1, 
10 unordered 
pieces. 


0.058 


CEZK863J 


Caenorhabditis elegans 
cosmid ZK863, complete 
sequence; ZK863;2; 
Similar to collagen 


1 


233 


AC000159 


if! >}> If! 

SEQUENCING 
IN PROGRESS 
*** Human BAC 
Clone llql3; 
HTGS phase 1, 
10 unordered 
pieces. 


0.058 


CAC2 HAEC 
O 


CUTICLE COLLAGEN 
2C 

(FRAGMENT)>GP:HAE 
COL2C_l H;contortus 
collagen 2C mRNA, 
3'end 


L20E- 
08 


234 


Z23908 


H. sapiens 
(D5S630) DNA 
segment 
containing (CA) 
repeat; clone 
AFM268zd9; 
single read. 


0.057 


VEU34999J 


Venezuelan equine 
encephalitis virus 
nonstructural and 
structural polyprotein 
genes, complete cds; 
Nonstructural 
polyprotein; Internal stop 
codon, readthrough 
occurs 5% of the time 


0.0002 
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Nearest 
Neighbor 
(BlastN vs. 
Genbank) 


Nearest 
Neighbor 
(BlastX vs. 
Non- 
Redundant 
Proteins) 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P 

VALUE 


ACCESSION 


DESCRIPTION 


P 

VALUE 


235 


B21875 


T3E8-Sp6 TAMU 
Arabidopsis 

thaliana genomic 
clone T3E8. 


0.055 


YRR2 CAEE 
L 


HYPOTHETICAL 9 LI 
KD PROTEIN Rl 44.2 
IN CHROMOSOME 
III>GP:CELR144_7 
Caenorhabditis elegans 
cosmid R144; Coded for 
by C; elegans cDNA 
CEESP84R; coded for by 
C; elegans cDNA 
yk23c4;5; coded for by 
C; elegans cDNA 
yk44f9;5; coded for by 
C; eleg 


0.68 


236 


Z98303 


Human DNA 
sequence *** 
SEQUENCING 
IN PROGRESS 
*** from clone 
140H19; HTGS 
phase 1 . 


0.048 


AC002330_3 


Arabidopsis thaliana 
BACTlOPll, complete 
sequence; Putative zinc- 
finger protein; C2H2 Zn- 
finger signature from 
position 80 to 100 
[CEICNKGFQRDQNLQ 
LHRRGH] 


0.99 


237 


D49911 


Therm us 
therm ophi Jus 
UvrA gene, 
complete cds. 


0.044 


APPIJVIOUS 
E 


AMYLOID-LIKE 
PROTEIN l 
PRECURSOR 
(APLP)>PIR2:A46362 
amyloid precursor-like 
protein - 

mouse>GP:MUSAPLP__ 
l Mouse amyloid 
precursor-like protein 
mRNA, complete cds 


8.90E- 
06 


238 


D49911 


Thermus 
thermophilus 
UvrA gene, 
complete cds. 


0.044 


MMCOL18A1 
1_2 


Mus musculus alpha- 
1 (XVIII) collagen 
(COL18A1) gene, exons 
40-43, complete cds 


L60E- 
06 
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Nearest 






Nearest 
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In 

.0. . .'. : v >:. : : >.v:. ::.:>. 


Neighbor 
(BlastN vs. 
Genbank) 






Neighbor 
(BlastX vs. 
Non- 
Redundant 
















Proteins) 








SEQ 


ACCESSION 


DESCRIPTION 


P 


ACCESSION 


DESCRIPTION 


P 




TTV 

ID 






VALUE 






VALUE 




239 


X78119 


P.amygdalus, 
Batsch (Texas) 
prul mRNA. 


0.042 


CA44 HUMA 
N 


COLLAGEN ALPHA 
4(IV) CHAIN 
PRECURSOR>PIRl:CG 
HU1B collagen alpha 
4(1 V) chain precursor - 
human>GP:HSCOL4A4__ 


2.00E- 
06 


CI ■ 

- " K 

ft .5 












1 H;sapiens mRNA for 
collagen type IV alpha 4 
chain; Type IV collagen 
alpha 4 chain 






240 


U72877 


Rana catesbeiana 


0.041 


YRR6 MYCC 


HYPOTHETICAL 33.0 


0.0008 


? 

<* !■ 






L-epinephrine 




A 


KD PROTEIN IN LICA 




ill 
in 

m 

Hi 5 

? '<?^* 

.r<™. -I; 

Lj 






transporter 
mRNA, complete 
cds. 






3'REGION (ORF 
R6)>PIR2:S42125 
hypothetical protein 3 - 
Mycoplasma capricolum 
(SGC3)>GP:MYCRPM 
H_6 M; capricolum 
rpmH, rnpA and licA 
gene; Orf R6 




AS?!) 


241 


L39891 


Homo sapiens 
polycystic kidney 
disease- 

aobocidLcu proiem 
(PKDl)gene, 
complete cds. 


0.04 


MUC2 HUM 
AN 


MUCIN 2 

(INTESTINAL MUCIN 
2) (FRAGMENTS) 


5.90E- 
05 




242 


L40390 


Candida glabrata 
ERG3 gene, 
comnlete cds 


0.039 


GO 1763 


atrophin-1 - 

human>GP:HSU2385 1_1 

Human atrnnhin-1 

ilUlliu.il CiLi v/LJJ 1 11 1 J 

mRNA, complete cds 


9.00E- 
07 




243 


B28113 


T2L16TRB 
TAMU 
Arabidopsis 
thaliana genomic 
clone T2L16. 


0.038 


CELZK1248 
14 


Caenorhabditis elegans 
cosmidZK1248 


1.60E- 
18 
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Nearest 
Neighbor 
(BlastN vs. 
Gen bank) 


Nearest 
Neighbor 
(BlastX vs. 
Non- 
Redundant 
Proteins) 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P 

VALUE 


ACCESSION 


DESCRIPTION 


P 

VALUE 


244 


AC000030 


00175, complete 
sequence. 


0.033 


ATFCA8_40 


Arabidopsis thaliana 
DNA chromosome 4, 
ESS A I contig fragment 
No; 8; Glycerol-3- 
phosphate permease 
homolog; Similarity to 
glycerol-3-phosphate 
permease - Haemophilus 
influenzae 


0.63 


245 


B 10738 


F13G15-Sp6 IGF 
Arabidopsis 
thaliana genomic 
clone F13G15. 


0.032 


D87521J 


Mus musculus DNA- 
PKcs mRNA, complete 
cds 


0.21 


246 


AF024503 


Caenorhabditis 
elegans cosmid 
F31F4. 


0.03 


138344 


titin - human 


1 


247 


Z49888 


Caenorhabditis 
elegans cosmid 
F47A4, complete 
sequence. 


0.027 


KSU52064J 


Kaposi's sarcoma- 
associated herpes-like 
virus ORF73 homolog 
gene, complete cds; 
Herpesvirus saimiri 
ORF73 

homolog>GP:KSU75698 
_78 Kaposi's sarcoma- 
associated herpesvirus 
long unique region, 80 
putative ORF's and 
kaposin gene, complete 

tHq- OR 

tUo, WIN. 


3.40E- 
10 


248 


Z83822 


Human DNA 
sequence from 
PAC 306D1 on 
chromosome X 
contains ESTs. 


0.025 


GRSB BACB 
R 


GRAMICIDIN S 
SYNTHETASE II 
(GRAMICIDIN S 
BIOSYNTHESIS GRSB 
PROTEIN) (EC 6.-.-.-) 


1 


24y 


Z94161 


Human DNA 
sequence *** 

SEQUENCING 
IN PROGRESS 
*** from clone 
N102C10; HTGS 
phase 1. 


0.025 


SI 6323 


hypothetical protein - 
Arabidopsis 
thaliana>GP:ATHBl_l 
A;thaliana homeobox 
geneAthb-1 mRNA; 
Open reading frame 


0.0079 
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■ 


Neighbor 
(BlastN vs. 
Genbank) 






Neighbor 
(BlastX vs. 
Non- 
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INHHn 








Proteins) 






SEQ 


ACCESSION 


DESCRIPTION 


P 


ACCESSION 


DESCRIPTION 


P 


ID 






VALUE 






XT A T TT1? 


250 


AC002094 


Genomic 
sequence from 
Human 17, 
complete 
sequence. 


0.021 


S57447 


HPBRII-7 protein - 
human>GP:HSHPBRII4 

1 H;sapiens HPBRII-4 
mRNA>GP:HSHPBRII7 
J H;sapiens HPBRII-7 
gene 


8.20E- 
08 


251 


D79994 


Human mRNA 
for KIAA0172 
gene, partial cds. 


0.021 


CER10H10J 


Caenorhabditis elegans 
cosmid R10H10, 
complete sequence; 
R11A8;7; Protein 
predicted using 
Genefinder; Similarity to 
Mouse ankyrin (PIR Acc; 
No; S37771); cDNA EST 
CEESX25F comes from 
this gene; 


7.00E- 
16 


• 252 


Z97635 


Human DNA 
sequence *** 
SEQUENCING 
IN PROGRESS 
*** from clone 
438L4; HTGS 
phase 1. 


0.017 


CELW05H7_4 


Caenorhabditis elegans 
cosmid W05H7 


0.24 


253 


X84996 


X.laevis mRNA 
for selenocysteine 
tRNA acting 
factor (Staf). 


0.017 


JN0786 


integrin beta-4 chain 
precursor - mouse 


0.088 


254 


AC002543 


Human BAC 
clone RG300C03 
from 7q31.2, 
complete 
sequence. 


0.013 


MZLMTCYT 
BTJ 


Mendozellus isis 
mitochondrial NADH 
dehydrogenase, and 
cytochrome b genes, 3 f 
end, and transfer RNA- 
Ser gene; This codes for 
the last 43 amino acids of 
NADH dehydrogenase 
subunit 1 followed 


0.044 
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Nearest 
Neighbor 
(BiastN vs. 
Genbank) 


Nearest 
Neighbor 
(BlastX vs. 
Non- 
Redundant 
Proteins) 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P 

VALUE 


ACCESSION 


DESCRIPTION 


P 1 

VALUE 


255 


U10401 


Caenorhabditis 
elegans cosmid 
T20B12. 


0.012 


MMMHC29N 


Mus musculus major 
histocompatibility locus 
class HI 

region:butyrophilin-like 
protein gene, partial cds; 
Notch4, PBX2, RAGE, 
lysophatidic acid acyl 
transferase-alpha, 
palmitoyl- 


0.069 


256 


LI 4593 


Saccharomyces 
cerevisiae protein 
phosphatase 
(PTC 1) gene, 
complete cds. 


0.011 


D86995J 


Human (gene 1) DNA for 
phosphatase 2C motif, 
partial cos 


2.20E- 
14 


257 


U62317 


Chromosome 

22ql3BAC 

Clone 

CIT987SK- 

384D8 complete 

sequence. 


0.0093 


P2Y8 XENL 
A 


P2Y PURINOCEPTOR 8 
(P2Y8)>GP:XLP2Y8_1 
X;laevis mRNA for 
P2Y8 nucleotide receptor 


0.89 


258 


D29655 


Pig mRNA for 
UMP-CMP 
kinase, complete 
cds. 


0.0075 


AF004858_1 


Mus musculus platelet 
activating factor receptor 
mRNA, partial cds; PAF- 
receptor 


1 


259 


AF002992 


Homo sapiens 
cosmid from 
Xq28, complete 
sequence. 


0.0054 


FBN1 BOVI 
N 


FIBRILLIN 1 
PRECURSOR>PIR2:A5 

5567 fibrillin I - 
bovine>GP:BOVXAAA 

A_l Bos taurus mRNA, 

complete cds; Putative 


0.0004 


260 


B20752 


T19M2-T7 
TAMU 
Arabidopsis 
thaliana genomic 
clone T19M2. 


0.0043 


HSVT1IEPJ 


Feline herpesvirus type 1 
gene for immediate early 
protein, complete cds; 
Feline herpesvirus type 1 
immediate early protein 


3.90E- 
05 
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Nearest 
Neighbor 
(BlastN vs. 
Genbank) 


Nearest 
Neighbor 
(BlastX vs. 
Non- 
Redundant 
Proteins) 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P 

VALUE 


ACCESSION 


DESCRIPTION 


P 

VALUE 


261 


AB006699 


Arabidopsis 
thaliana genomic 
DNA, 

chromosome 5, 
PI clone: MDJ22. 


0.0037 


YHV5_YEAS 

rj-i 


HYPOTHETICAL 143.6 
KD PROTEIN IN 
SPO16-REC104 
INTERGENIC 
REGION>PIR2 : S46754 
hypothetical protein 
YHR155w- yeast 
(Saccharomyces 
cerevisiae)>GPN:YSCH9 
666_15 Saccharomyces 
cerevisiae chromosome 
VIII cosmid 9666; 
Yhrl55wp; Similar to 
Sip3p (Snf 


0.077 


262 


Z99128 


Human DNA 
sequence *** 
SEQUENCING 
IN PROGRESS 
*** from clone 
422H11;HTGS 
phase 1. 


0.0032 


ALU1 HUM 
AN 


!!!! ALU SUBFAMILY J 
WARNING ENTRY!!!! 


0.0087 


263 


B21848 


T2D2-Sp6 
TAMU 
Arabidopsis 
thaliana genomic 
clone T2D2. 


0.0031 


B3 1 794 


mdm-1 protein (clone 

I. \ 

cl03) - mouse 


1 .00E- 
05 


264 


L33853 


Human germline 
immunoglobulin 
kappa chain 
variable region 
(Vk-IV subgroup) 
for anti-B- 
amyloid 

autoantibodies in 

Alzheimer's 

disease. 


0.0027 


B45550 


cytochrome b homolog - 
Plasmodium yoelii 


0.99 


265 


B36863 


HS-1042-A1- 
F0l-MR.abi CIT 
Human Genomic 
Sperm Library C 
Homo sapiens 
genomic clone 


0.0027 


YQK4 CAEE 
L 


HYPOTHETICAL 64.3 
KD PROTEIN C56G2.4 
IN CHROMOSOME 
III>GP:CELC56G2_2 
Caenorhabditis elegans 
cosmid C56G2 


0.81 
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Nearest 
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(BlastN vs. 
Genbank) 


Nearest 
Neighbor 
(BlastX vs. 
Non- 
Redundant 
Proteins) 


ID 


ACCESSION 


DESCRIPTION 


P 

VALUE 


ACCESSION 


DESCRIPTION 


P 

VALUE 






Plate=CT 824 
Col=l Row=K. 










266 


AC003041 


SEQUENCING 
IN PROGRESS 
*** Homo 
sapiens 

chromosome 17, 
clone 

HCIT307A16; 
HTGS phase 1, 
10 unordered 
pieces. 


0.0024 


GLB4 LAMS 
P 


GIANT HEMOGLOBIN 
AI V CHAIN 
(FRAGMENT)>PIR2 : SO 
1810 hemoglobin AIV - 
tube worm 
(Lamellibrachia sp.) 
(fragment) 


0.94 


267 


AC002315 


Mouse BAC- 
146N21 

Chromosome X 

contains 

iduronate-2- 

sulfatase gene; 

complete 

sequence. 


0.0022 


MG42 TARM 
A 


SRY-RELATED 
PROTEIN MG42 
(FRAGMENT)>PIR3 : 15 
1369 Sry-related 
sequence - Tarentola 
mauritanica 

(fragment)>GP:TELMG4 
2DNA_1 Gecko MG42 
gene, partial cds; Sry- 
related sequence 


0.99 


268 


AFO 16674 


Caenorhabditis 
elegans cosmid 
C03H5. 


0.0015 


SCYJL204C 
1 


S;cerevisiae chromosome 
X reading frame ORF 
YJL204c 


1 


269 


AFO 16674 


Caenorhabditis 
elegans cosmid 
C03H5. 


0.0015 


CEM199J3 


Caenorhabditis elegans 
cosmid Ml 99, complete 
sequence; M199;e; 
Protein predicted using 
Genefinder; preliminary 
prediction 


0.97 


270 


AF016674 


Caenorhabditis 
elegans cosmid 
C03H5. 


0.0015 


CEM1993 


Caenorhabditis elegans 
cosmid Ml 99, complete 
sequence; M199;e; 
Protein predicted using 
Genefinder; preliminary 
prediction 


0.97 
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Nearest 
Neighbor 
(BlastN vs. 
Genbank) 


Nearest 
Neighbor 
(BlastX vs. 
Non- 
Redundant 
Proteins) 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P 

VALUE 


ACCESSION 


DESCRIPTION 


P 

VALUE 


271 


Z54199 


L.esculentum 
DNA Ailsa craig 
encoding 1- 
aminocyclopropa 
ne-l-carboxylic 
acid oxidase. 


0.0015 


CELF20A1J 


Caenorhabditis elegans 
cosmid F20A1; Coded 
for by C; elegans cDNA 
yk9gl ;3; coded for by C; 
elegans cDNA yk9gl;5; 
coded for by C; elegans 
cDNA CEESU55F; weak 
similarity to putative 


0.11 


272 


Z99943 


Human DNA 
sequence *** 
SEQUENCING 
IN PROGRESS 
*** from clone 
313L4; HTGS 
phase 1. 


0.0014 


CEK08F8_5 


Caenorhabditis elegans 
cosmid K08F8, complete 
sequence; K08F8;5b 


0.93 


273 


S81083 


beta- 

ADD=adducin 
beta subunit 63 
kda 

isoform/membran 
e skeleton 
protein, beta - 
ADD=adducin 
beta subunit 63 
kda 

isoform/membran 
e skeleton protein 
{alternatively 
spliced, exon 10 
to 13 region} 
[human, 
Genomic, 1851 
nt, segment 3 of 

3]. 


0.0013 


MTCY277J7 


Mycobacterium 
tuberculosis cosmid 
Y277; Unknown; 
MTCY277;07c, 
unknown. len: 302 


0.0001 


274 


Z82174 


Human DNA 
sequence from 
cosmid B20F6 on 
chromosome 
22qll.2-qter. 


0.001 


FBLA HUM 
AN 


FIBULIN-l,ISOFORM 
A 

PRECURSOR>GP:HSFI 

BUA 1 H;sapiens 
mRNAforfibulin-1 A 


0.00063 
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■ .v.;---:-:-:.,'.' .:.>. : 


Nearest 






Nearest 






- . 


Neighbor 






Neighbor 






■ ,. : .r ■: 

' - ■ 

• ; ■ ■ . 

.-.••."•••!••!.• v>;.':^ _> 
. ■, 


(BlastN vs. 
Genbank) 






(BlastX vs. 
Non- 
Redundant 














Proteins) 






SEQ 


ACCESSION 


DESCRIPTION 


P 


ACCESSION 


DESCRIPTION 


P 


ID 






VALUE 






VALUE 


275 


Z82215 


Human DNA 
sequence *** 
SEQUENCING 
IN PROGRESS 
*** from clone 
6802; HTGS 
phase 1 . 


0.00079 


BFR1 SCHP 
O 


BREFELDIN A 
RESISTANCE 
PROTEIN>PIR2:S52239 
hba2 protein - fission 
yeast 

(Schizosaccharomyces 
pombe)>GP:SPHBA2GE 
N_l S;pombe hba2 gene 


0.15 


276 


U28153 


Caenorhabditis 
elegans UNC-76 
(unc-76) gene, 
complete cds. 


0.0007 1 


CX2_HEMHA 


CYTOTOXIN 2 (TOXIN 
12A) 


0.32 


277 


Z82204 


Human DNA 
sequence from 
clone J362G171. 


0.00054 


DMU34925_2 


Drosophila melanogaster 
DNA repair protein (mei- 
41) gene, complete cds, 
and TH1 gene, partial cds 


0.045 


278 


AC002530 


Human BAC 
clone RG341D10 
from 7pl5-p21, 
complete 
sequence. 


0.00053 


CELT28F2_2 


Caenorhabditis elegans 
cosmid T28F2; Weak 
similarity to HSP90 


0.037 


279 


U91322 


Human 

chromosome 

16pl3 BAC clone 

CIT987SK-276F8 

complete 

sequence. 


0.00051 


CEW08D2_2 


Caenorhabditis elegans 
cosmid W08D2, 
complete sequence; 
W08D2;3; Protein 
predicted using 
Genefinder>GP:CEW08 
D2__2 Caenorhabditis 
elegans cosmid W08D2; 
W08D2;3; Protein 
predicted using 
Genefinder 


0.26 
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Nearest 
Neighbor 
(BlastN vs. 

Genbank) 


Nearest 
Neighbor 
(BlastX vs. 
Non- 
Redundant 
Proteins) 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P 

VALUE 


ACCESSION 


DESCRKTION 


P 

VALUE 


280 




Human T-TpnG? 

partial cDNA, 
clone 

hmd2b09m5. 




pot n PPVM 

rULU 11 V IN 

A 


POLYPROTEIN 
(CONTAINS: N- 
TERMINAL PROTEIN; 
HELPER COMPONENT 

PROTEINASE (EC 

"\ A 99 fl4fVPROV 49- 

50 KD PROTEIN; 
CYTOPLASMIC 
INCLUSION PROTEIN 
(CI); 6 KD PROTEIN; 

NT ICI FAR 

INCLUSION PROTEIN 
A (NI- A) (EC 3.4.22.-) 
(49K PROTEINASE) (49 




281 


U91318 


Human 
chromosome 
16pl3 BAC clone 
CIT987SK- 
962B4 complete 
sequence. 


0.00031 


<NONE> 


<NONE> 


<NONE 

> 


282 


M93406 


Human dispersed 
Alu repeats and 

repeat. 


0.0003 


VG8_SPV4 


GENE 8 

PROTEIN>PIRl :G8BPS 

V OAnp S ttrrvtpin - 

V gCLlC O jJIULCiil 

spiroplasma virus 4 
(SGC3) 


0.23 


283 


AC002398 


Human DNA 
from 

chromosome 19- 
specific cosmid 
F25965, genomic 
sequence, 
complete 
sequence. 


0.00021 


HMCA_DRO 
ME 


HOMEOT1C CAUDAL 
PROTEIN>PIR2 A263 57 
homeotic protein Cad - 
fruit fly (Drosophila 
melanogaster)>GP:DRO 
CADA2J 

D;melanogaster caudal 
gene (cad) encoding a 
maternal and zygotic 
transcript, exon 2; Caudal 
protein>TFD:TFDP0015 
9 - Polypeptides en 


0.021 


284 


AC002530 


Human BAC 
clone RG341D10 
from 7pl5-p21, 
complete 


0.0002 


PL0009 


complement 
C3d/Epstein-Barr virus 
receptor precursor - 
human 


0.7 
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Nearest 
Neighbor 
(BlastN vs. 
Genbank) 


Nearest 
Neighbor 
(BlastX vs. 
Non- 
Redundant 
Proteins) 


SEQ 
ID 


APCFSSION 


DESCRIPTION 


p 

JL 

VALUE 


ACCESSION 


DESCRIPTION 


P 

M. 

VALUE 






sequence. 










285 


X01871 


Yeast 

mitochondrial 
ori(o) repeat unit 
of petite mutant 5 
(petite strain s- 
10/7/2). 


0.00015 


RVZMTCYT 
BTJ 


Reventazonia sp; 
mitochondrial NADH 
dehydrogenase, and 
cytochrome b genes, 3' 
end, and transfer RNA- 
Ser gene; This codes for 
the last 43 amino acids of 
NADH dehydrogenase 
subunit 1 followed 


0.73 


286 


U89984 


Acanthamoeba 
castellanii 

tran <\for m ati nn- 

sensitive protein 
homolog mRNA, 
complete cds. 


0.00015 


ACU89984J 


Acanthamoeba castellanii 
transformation-sensitive 

nrotpin hnmoloo* mRNA 

complete cds; Similar to 
human transformation- 
sensitive protein: 
SwissProt Accession 
Number P31948 


4.20E- 
13 


287 


AC002365 


Homo sapiens 
chromosome X 
clone U177G4, 
U152H5 

U168D5, 174A6, 
U172D6, and 
U186B3from 
Xp22 5 complete 
sequence. 


0.00011 


SI 0340 


DNA-directed RNA 
polymerase (EC 2.7.7.6) 
- yeast (Kluyveromyces 
marxianus var lactis^ 


0.00062 


288 


AC002390 


Human DNA 
from overlapping 
chromosome 19- 
specific cosmids 
R30072 and 
R28588, genomic 
sequence, 
complete 
sequence. 


9.90E-05 


D86603J 


Mouse mRNA for Bach 
protein 1 , complete cds; 
Bachl 


1 


289 


AC002980 


Homo sapiens; 
HTGS phase 1, 
34 unordered 


9.20E-05 


TRBKPCYB 
1 


Trypanosoma brucei 
kinetoplast 

apocytochrome b gene, 


0.52 
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Nearest 
Neighbor 
(BlastN vs. 
Genbank) 


Nearest 
Neighbor 
(BlastX vs. 
Non- 
Redundant 
Proteins) 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P 

VALUE 


ACCESSION 


DESCRIPTION 


P 

VALUE 






pieces. 






complete cds 


r j 


290 


M99412 


Human 
interleukin-8 
receptor (IL8RB) 
gene, complete 
cds. 


4.50E-05 


S28832 


microtubule-associated 
protein HI (clone KS3.1) 
- longfin squid 
(fragment) 


0.88 


291 


AC000120 


Human BAC 
clone RG161K23 
from 7q21, 
complete 
sequence. 


4.00E-05 


SXSCRBAJ 


S;xylosus scrB and scrR 
genes; Sucrose repressor 


0.99 


292 


AC003037 


Homo sapiens; 
HTGS phase 1, 
66 unordered 
pieces. 


3.40E-05 


S13569 


hypothetical protein 5 - 
Lactococcus lactis subsp. 
lactis insertion sequence 
1076>GP:LLTLEJ 
Lactococcus lactis DNA 
for the transposon-like 
element on the lactose 
plasmid; ORF5 (AA 1 - 
43) 


0.018 


293 


Z81512 


Caenorhabditis 
elegans cosmid 
F25C8, complete 
sequence. 


2.40E-05 


MUSDBPRC 
1 


Mus musculus DNA- 
binding protein Rc 
mRNA, complete cds; 
DNA binding protein Rc 


I 


294 


B16681 


343C3.TVB 
CIT978SKA1 
Homo sapiens 
genomic clone A- 
343C03. 


1.10E-05 


COPP YEAS 
T 


COATOMER BETA' 
SUBUNIT (BETA*- 
COAT PROTEIN) 
(BETA- 

COP)>PIR2:B55123 
coatomer complex beta* 
chain - yeast 
(Saccharomyces 
cerevisiae)>GPN:SCYG 
L137W_1 S;cerevisiae 
chromosome VII reading 
frame ORF 

YGL137w>GP:SCU1123 
7_1 Saccharomyces 
cerevisiae 


0.081 
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flat 


gig. 

A* "$<r\ ^-t 


Nearest 
Neighbor 
(BlastN vs. 
Genbank) 


Nearest 
Neighbor 
(BlastX vs. 
Non- 
Redundant 
Proteins) 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P 

VALUE 


ACCESSION 


DESCRIPTION 


P 

VALUE 


295 


Z 16523 


H. sapiens 
(D9S158) DNA 
segment 
containing (CA) 
repeat; clone 
AFM073ybll; 
single read. 


1.00E-05 


MMSEMFJ 


M;musculus mRNA for 
semaphorin F; 
Smaphorin F 


0.78 


296 


Z49704 


S.cerevisiae 
chromosome XIII 
cosmid 8021. 


5.60E-06 


<NONE> 


<NONE> 


<NONE 

> ; ' 


297 


AC003071 


Human BAC 
clone BK085E05 
from 22ql2.1- 
qter, complete 
sequence. 


3.00E-06 


HSRCAERl 


H;sapiens mRNA for red 
cell anion exchanger 
(EPB3, AEl,Band3)3' 
non-coding region 


0.21 


298 


U20428 


Human SNC 19 
mRNA sequence. 


1.40E-06 


HUMMUC2A 
_1 


Human mucin-2 gene, 
partial cds 


4.40E- 
06 


299 


U5 1 903 


Human RasGAP- 
related protein 
(IQGAP2) 
mRNA, complete 
cds. 


6.60E-07 


IQGA HUMA 
N 


RAS GTPASE- 
ACTIVATING-LIKE 
PROTEIN IQGAP1 
(P195)>PIR2:A54854 
Ras GTPase activating- 
related protein - 

human>GP:HUMIQGA_ 
1 Homo sapiens ras 
GTPase-activating-l ike 
protein (IQGAP1) 
mRNA, complete cds; 
Amino acid feature: IQ 
calmodul in-binding do 


1 .60E- 
14 


300 


AL000805 


F.rubripes GSS 
sequence, clone 
021G08aAl. 


4.70E-07 


MT13 MYTE 
D 


METALLOTHIONEIN 
10-111 (MT-10- 
III)>PIR2:S39418 
metallothionein 10-111 - 
blue mussel 


2.20E- 
10 
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Nearest 
Neighbor 
(BlastN vs. 
Genbank) 


Nearest 
Neighbor 
(BlastX vs. 
Non- 
Redundant 
Proteins) 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P 

VALUE 


ACCESSION 


DESCRIPTION 


P 

VALUE 


301 


AC003016 


Human BAC 
clone RG134C19 
from 8q21, 
complete 
sequence. 


4.30E-07 


SPC57A10 5 


SiDombe chromosome I 
cosmid c57A10; 
Unknown; 
SPAC57A10;05;c, 
unknown, len:606aa, 
similar to A; nidulans 
Q00659, sulfur 
metabolite repression 
control, (678aa) ? fasta 
scores, opt: 1355, 


0 00041 


302 


AC003089 


Human BAC 
clone 

1VVJ i (Jul UOrVj 

complete 
sequence. 


3.80E-07 


HPBPRECK 
1 


Hepatitis B virus type 1 1 
precore protein (pre-C 


0.41 


303 


AC002074 


Human BAC 
clone GS056H18 
from 7q31-q32, 
comnlete 
sequence. 


2.40E-07 


A47021J 


Sequence 23 from Patent 
W09527787; Unnamed 
protein product; Author- 

Piven nrnfpin ^pmipnrp i<? 

in conflict with the 
conceptual 

translation>GP: A5 1 260_ 
1 Sequence 23 from 
Patent W09614416; 
Unnamed protein 
product; Author-dven 
protein sequence is i 


0.0016 


304 


U04980 


Rattus norvegicus 
fetal troponin T 3 
(fetal TnT3) 
mRNA, partial 
cds. 


2.20E-07 


HUMFSHD_1 


Human 

facioscapulohumeral 
muscular dystrophy 
(FSHD) gene region, 
D4Z4 tandem repeat unit; 
ORF 


3.30E- 
08 


305 


U68704 


Human 
chromosome 
21q22.3 PI -clone 
3804 subclone 4- 
52. 


2.00E-07 


HHV6AGNM 
_96 


Human herpesvirus-6 
(HHV-6)U1 102, variant 
A, complete virion 
genome; U88;Cys 
repeats; this loci is open 
in all six reading frames, 
part of IE- A 


2.70E- 
05 
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Nearest 






Nearest 








- 


Neighbor 
(BlastN vs. 
Genbank) 






Neighbor 
(BlastX vs. 
Non- 
Redundant 
















Proteins) 








SEQ 


ACCESSION 


DESCRIPTION 


P 


ACCESSION 


DESCRIPTION 


P 




ID 






VALUE 






VALUE 


s . 


306 


U51583 


Rattus norvegicus 
zinc finger 
homeodomain 
enhancer-binding 
protein- 1 (Zfhep- 
1) rnRNA, partial 


8.70E-08 


AF005370_67 


Alcelaphine herpesvirus 
1 L-DNA, complete 
sequence; Putative 
immediate early protein; 
ORF73; similar to H; 
saimiri and KSHV 


6.10E- 
07 


w 






cds. 






ORF73 






307 


M80206 


Mus domesticus 


8.10E-08 


153960 


PRR2 alpha - human 


1.70E- 


•t* 






poliovirus 






28 


} 

£.? i : 

> 1*. 

5 3 : 
t-r i: 
^ >i «*• 






receptor homolog 
(MPH) rnRNA, 
complete cds. 










J S 

»v" J* 


308 


M60854 


Human ribosomal 


5.70E-08 


OLVPOLJ 


Caprine arthritis 


0.27 


w 

fa* 

I! 
tar 






protein S16 
rnRNA, complete 
cds. 






encephalitis virus (isolate 
OVLV-Nl)poI protein 
gene, 3' end of cds; Nt 
2497-2695 from CAEV 
Co 




*» 

i . ** 


309 


U82828 


Homo sapiens 


1.50E-08 


C4020 1 


artifact-warning 


0.00044 


i. If 






ataxia 

telangiectasia 






sequence (translated 

ALU class C) - human 










(ATM) gene, 
complete cds. 










310 


Z83836 


Human DNA 
sequence from 
PAC lllJ24on 
chromosome 
22ql2-qter 
contains ESTs. 


1.40E-08 

- 


HSU64473J 


Human rheumatoid 
arthritis synovium 
immunoglobulin heavy 
chain variable region 
rnRNA, partial 
cds>GP:HSU64498_l 
Human rheumatoid 
arthritis synovium 
immunoglobulin heavy 
chain variable region 
rnRNA, partial cds 


0.34 




311 


Z50029 


Caenorhabditis 
elegans cosmid 
ZC504, complete 
sequence. 


1.40E-08 


MMU88984J 


Mus musculus NIK 
rnRNA, complete cds 


1.70E- 

50 
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Nearest 






Nearest 






. : ■- ::>■:::;>;::;?■:::;;:>::■:»: 

- ■- 


Neighbor 
(BlastN vs. 
Genbank) 






Neighbor 
(BlastX vs. 
Non- 






iiIBH8 








Redundant 














Proteins) 






SEQ 


ACCESSION 


DESCRIPTION 


P 


ACCESSION 


DESCRIPTION 


P 


ID 






VALUE 






VALUE 


312 


AC002351 


Homo sapiens; 
HTGS phase 1, 
17 unordered 
pieces. 


1.20E-08 


D41132 


collagen-related protein 4 
- Hydra magnipapillata 
(fragment)>PIR2:S21932 
mini-collagen - Hydra 
sp>GP:HSNCOL4 1 
Hydra N-COL 4 mRNA 
for mini-collagen; No 
start codon 


0,02 


313 


B65763 


CIT-HSP- 
2023A12.TR 
CIT-HSP Homo 
sapiens genomic 
clone 2023 A 12. 


3.60E-09 


S18106 


type II site-specific 
deoxyribonuclease (EC 
3.1.21.4) Abrl- 
Azospiriilum brasilense 


0.045 


314 


Z93021 


Human DNA 
sequence *** 
SEQUENCING 
IN PROGRESS 
*** from clone 
516C23;HTGS 
phase 1. 


2.00E-09 


AB001684 13 
4 


Chlorella vulgaris C-27 
chloroplast DNA, 
complete sequence; RNA 
polymerase gamma 
subunit 


0.6 


315 


D88035 


Rat mRNA for 
glycoprotein 
specific UDP- 
glucuronyltransfe 
rase, complete 
cds. 


1 .50E-09 


D88035_l 


Rat mRNA for 
glycoprotein specific 
UDP- 

glucuronyltransferase, 
complete cds 


1.00E- 
33 


316 


U85193 


Human nuclear 
factor I-B2 
(NFIB2) mRNA, 
complete cds. 


1.30E-10 


VGF1JBVB 


Fl 

PR0TE1N>PIR1:VFIHB 
1 Fl protein - avian 

infectious bronchitis 
virus (strain 

Beaudette)>GP:IBACGB 
_1 Avian infectious 
bronchitis virus pol 
protein, spike protein, 
small virion-associated 
protein, membrane 
protein, and nucleocapsid 
protein gen 


1 
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- , jlS.,* 1 
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Nearest 
Neighbor 
(BlastN vs. 
Genbank) 


Nearest 
Neighbor 
(BlastX vs. 
Non- 
Redundant 
Proteins) 


SEQ 

TTfc 
ID 


ACCESSION 


DESCRIPTION 


p 

VALUE 


ACCESSION 


DESCRIPTION 


P 

VALUE 1 


317 


B04719 


cSRL-42G12-u 
cSRL flow sorted 
Chromosome 1 1 
specific cosmid 
Homo sapiens 
genomic clone 
CSRL-42G12. 


7.90E-1 1 


I JC5238 


galactosy lceramide-li ke 
protein, GCP - human 


0.31 


318 


M73506 


Mouse Tcp-10c(t 
allele) gene. 


2.80E-1 1 


A39487 


T-complex protein 10a 
(allele 129) - mouse 


4.10E- 
16 J 


319 


U71148 


Human Xq28 
cosmids U225B5 
and U236A12, 
complete 
sequence. 


1.20E-11 


A56547 


sex-peptide precursor - 
Drosophila suzukii 


0.4 


320 


Z95116 


Human DNA 
sequence *** 

SEQUENCING 
IN PROGRESS 
*** from clone 
57G9; HTGS 
phase I. 


9.90E-13 


ALU2 HUM 
AN 


!!!! ALU SUBFAMILY 

SB WARNING ENTRY 
nil 

• * • * 


0.0017 


321 

1 


M64795 


Rat MHC class I 
antigen gene 
(RTl-u 

haplotype), 
complete cds. 


1.70E-14 


STCJDROME 


SHUTTLE CRAFT 
PROTEIN>GP:DMU093 
06_1 Drosophila 
melanogaster shuttle craft 
protein (stc) mRNA, 
complete cds; C-terminal 
lll amino acius encooe a 
novel single- stranded 
DNA binding domain 


1.40E- | 
13 


322 


Y09036 


H.sapiens 
NTRK1 gene, 
exon 17. 


4.20E-15 


AF010403J 


Homo sapiens ALR 
mRNA, complete cds; 
Alternatively spliced; 
similarity to ALL- 1 and 
Drosophila trithorax 




323 


U12523 


Rattus norvegicus 
ultraviolet B 
radiation- 
activated UV98 
mRNA, partial 
sequence. 


2.90E-15 


SPBC30D10 
4 


S;pombe chromosome II 
cosmid C30D10; 
Hypothetical protein; 
SPBC30D10;04, 
unknown, len:148aa 


2.40E- 
09 
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: ;i !S S5 f>? .. .• ^ 

W$B J*.-" * =- 


Nearest 
Neighbor 
(BlastN vs. 
Genbank) 


Nearest 
Neighbor 
(BlastX vs. 
Non- 
Redundant 
Proteins) 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P 

VALUE 


ACCESSION 


DESCRIPTION 


P 

VALUE 


324 


Z98755 


Human DNA 
sequence *** 
SEQUENCING 
IN PROGRESS 
*** from clone 
76C18; HTGS 
phase 1. 


2.20E-15 


RPON HAL 
MA 


DNA-DIRECTED RNA 
POLYMERASE 
SUBUNITN(EC 
2.7.7.6)>PIR2:D41715 
DNA-directed RNA 
polymerase II chain 
RPB10 homoloe- 
Haloarcula 

marismortui>GP:HALH 
MAENOA_4 
H;marismortui tRNA- 
Leu, HL29, HmaL13, 
HmaS9, OrfMMV, 
OrfMNA, 2- 
phosphoglycerate dehydr 


0.019 


325 


M86917 


Human oxysterol- 
binding protein 
(OSBP) mRNA, 
complete cds. 


1.60E-15 


CEF14H8_2 


Caenorhabditis elegans 
cosmid F14H8, complete 
sequence; F14H8;1; 
Similarity to Human 
oxysterol-binding protein 
(SW:OXYB_HUMAN) 


2.10E- 
18 


326 


AC001231 


Genomic 
sequence from 
Human 17, 
complete 
sequence. 


1.30E-15 


AC002397J 


Mouse BAC284H12 
Chromosome 6, complete 
sequence; DRPLA 


0.0016 


327 


AL008626 


Human DNA 
sequence *** 
SEQUENCING 
IN PROGRESS 
*** from clone 
1114G22; HTGS 
phase 1. 


5.30E-16 


TAU48227_1 


Triticum aestivum 
soluble starch synthase 
mRNA, partial cds 


5.90E- 
05 
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■ ; 
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■ : ! : >?-r; 5.;:: ■•iti •■■ ■ • "v 


Nearest 
Neighbor 
(BlastN vs. 
Genbank) 


Nearest 
Neighbor 
(BlastX vs. 
Non- 
Redundant 
Proteins) 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P 

VALUE 


ACCESSION 


DESCRIPTION 


P 

VALUE 


328 


L04483 


Human ribosomal 
protein S21 
(RPS21)mRNA, 
complete cds. 


7.60E-17 


RS21JIUMA 
N 


40S RIBOSOMAL 
PROTEIN 
S21>PIR2:S34108 
ribosomal protein S21 - 
human>GP:SSZ840 1 5_1 
S;scrofa mRNA; 
expressed sequence tag 
(3'; clone cllglO); 40S 
ribosomal protein S21; 
Similar to human 40S 
ribosomal protein 
S21>GP:HUMRPS21X_ 
1 Human ribosomal 


1.40E- 
09 


329 

- 


AB001899 


Homo sapiens 
PACE4 2ene 
exon 2. 


6.70E-17 


LRP1_HUMA 
N 


LOW- DENSITY 
T IPOPROTFIN 

RECEPTOR-RELATED 
PROTEIN 1 
PRECURSOR (LRP) 
(ALPHA-2- 
MACROGLOBULIN 
RECEPTOR) (A2MR) 
(APOL1POPROTEIN E 
RECEPTOR) 
(APOER)>PIR2:S02392 
LDL receptor-related 
protein precursor - 
human>GP:HSLDLRRL 
_1 Human mRNA for 
LDL-recept 


1 


330 


Z98755 


Human DNA 
sequence *** 
SEQUENCING 
IN PROGRESS 
*** from clone 
76C18; HTGS 
phase 1. 


4.40E-17 


U97553_59 


Murine herpesvirus 68 
strain WUMS, complete 
genome; Ribonucleotide 
reductase large 


0.06 
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Nearest 
Neighbor 
(BlastN vs. 
Genbank) 


Nearest 
Neighbor 
(BlastX vs. 
Non- 
Redundant 
Proteins) 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P 

VALUE 


ACCESSION 


DESCRIPTION 


P 

VALUE 


331 


AF017187 


Homo sapiens 
LTR HERV-K 
repetitive element 
fragment 
ltr_19_9a 
sequence. 


3.90E-18 


D84255J 


Ovophis okinavensis 
mitochondrial DNA for 
NADH dehydrogenase 
subunit 1, partial cds, Ile- 
tRNA, Pro-tRNA, Phe- 
tRNA, Gin- tRNA, Met- 
tRNA and control region 

fD-loon rpcnntiV Xbi^; cd<i 


0.007 


332 


B36252 


HS-1038-A2- 
G01-MR.abi CIT 
Human Genomic 
Sperm Library C 
Homo sapiens 
genomic clone 
Plate-CT 820 
Co 1=2 Row=M. 


3.10E-18 


PGBM MOU 
SE 


BASEMENT 
MEMBRANE- 
SPECIFIC HEPARAN 
SULFATE 
PROTEOGLYCAN 
CORE PROTEIN 
PRECURSOR (HSPG) 
(PERLECAN) 
(PLC)>PIR2:S 18252 
heparan sulfate 
oroteoslvcan - 
mouse>GP:MUSPERPA 
_1 Mouse perlecan 
mRNA, complete cds 


0.00015 


333 


D78255 


Mouse mRNA for 
PAP-1, complete 
cds. 


2.70E-18 


MUSPAP1J 


Mouse mRNA for PAP- 
1, complete cds 


3.50E- 
18 


334 


AC003046 


Human Xp22 
PACs RPC11- 
263P4 and 
RPC11-164K3 
complete 
sequence. 


1.40E-18 


CEC34F6_1 


Caenorhabditis elegans 
cosmid C34F6; C34F6;1; 
CDNA EST yk46b!2;5 
comes from this gene; 
cDNA EST yk44c4;5 
comes from this gene; 
cDNAESTyk46bl2;3 
comes from this gene 


0.0015 


335 


AC003002 


Human DNA 
from overlapping 

chromosome 19- 

specific cosmids 

R29515and 

R28253, genomic 

sequence, 

complete 

sequence. 


1.40E-18 


MUSZFPOJ 


Mouse mRNA for zinc 
finger protein, partial 
sequence 


1.30E- 
19 
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wmmms 
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Nearest 
Neighbor 
(BlastN vs. 
Genbank) 


Nearest 
Neighbor 
(BlastX vs. 
Non- 
Redundant 
Proteins) 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P 

VALUE 


ACCESSION 


DESCRIPTION 


P 

r 

VALUE 


336 


Y15054 


Rattus norvegicus 
mRNA for 70 
kDa turnor 
specific antigen, 
partial. 


3.40E-19 


HS4U2IR2_1 


Epstein-Barr virus 
(AG876 isolate) U2-IR2 
domain encoding nuclear 
protein EBNA2, 
complete cds; Nuclear 
antigen 2 


2.00E- 
06 


337 




OUIIld.il LALN/V 

sequence *** 
SEQUENCING 
IN PROGRESS 
*** from clone 
295C6; HTGS 
phase 1. 


1 1 Q 


Ar(Jl)j5j5_l 


Homo sapiens LI 
element 0RF2~like 
protein gene, partial cds 


7.00E- 
05 


338 


M97159 


Mouse (clone 
pIL2)Bl 
dispersed repeat 
unit. 


1.10E-19 


A26882 


pIL2 hypothetical protein 
-rat 

(fragment)>GP:RATTD 
R_l Rat growth and 
transformation-dependent 
mRNA, 3' end; Growth 
and transformation 
dependent protein 


0.2 


339 


U30817 


Bos taurus very- 
long-chain acyl- 

PnA 

dehydrogenase 
mRNA, nuclear 
gene encoding 
mitochondrial 
protein, complete 
cds. 


4.70E-20 


ACDV_RAT 


ACYL-COA 
DEHYDROGENASE, 

\/DDV T r\\TA^ niTA TX t 

VrvKY-LONCj-CHAIN 

SPECIFIC 
PRECURSOR (EC 
1.3.99.-) 

(VLCAD)>PIR2A54872 
acyl-CoA dehydrogenase 
(EC 1.3.99.-) very-long- 
chain-specific precursor - 
rat>GP:RATVLCAD_l 
Rat mRNA for very- 
long-chain Acyl-CoA 
dehydrogenase, compl 


8.10E- 
25 


340 


Y11535 


H.sapiens mRNA 
for SHOXb 
protein. 


2.80E-20 


ALU1 HUM 
AN 


!!!! ALU SUBFAMILY J 
WARNING ENTRY!!!! 


0.00027 
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Nearest 
Neighbor 
(BlastN vs. 
Genbank) 


Nearest 
Neighbor 
(BlastX vs. 
Non- 
Redundant 
Proteins) 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P 

VALUE 


ACCESSION 


DESCRIPTION 


p 

VALUE 


341 


AL008730 


Human DNA 
sequence *** 
SEQUENCING 
IN PROGRESS 
*** from clone 
487J7; HTGS 
phase 1. 


7.10E-21 


C40201 


artifact-warning 
sequence (translated 
ALU class C) - human 


0.001 


342 


U96629 


Human 
chromosome 8 
BAC clone 

PTTQR7W OAS 

^1 170/ OIV"Z/\0 

complete 
sequence. 


5.30E-23 


ALU1 HUM 
AN 


!!!!ALU SUBFAMILY J 
WARNING ENTRY!!!! 


3.80E- 
10 


343 


U95743 


Homo sapiens 
chromosome 16 
BAC clone 
CIT987-SK65D3, 

complete 
sequence. 


2.10E-24 


UROMJHUM 
AN 


UROMODULIN 

HORSFALL URINARY 
GLYCOPROTEIN) 
(THP)>PIR2:A30452 
uromodulin precursor - 
human>GP:HUMUMOD 
_1 Human uromodulin 
(Tamm-Horsfall 
glycoprotein) mRNA, 
complete cds; 
Uromodulin precursor 


1 


344 


U15972 


Mus musculus 
homeobox 
(Hoxa7) gene, 
complete cds. 


4.00E-25 


S20790 


extensin - 

almond>GP:PAEXTSJ 
P;amygdalus mRNA for 
extensin 


0.34 


345 


U15972 


Mus musculus 
homeobox 
(Hoxa7) gene, 
complete cds. 


4.00E-25 


CA24 CAEE 
L 


COLLAGEN ALPHA 
2(IV) CHAIN 

PRECURSOR>GP:CEC 
OLA2IVJ2 C;elegans 
a2(IV) collagen gene; 

Alternatively spliced 
transcript 


0.1 
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Nearest 
Neighbor 
(BlastN vs. 
Genbank) 


Nearest 
Neighbor 
(BlastX vs. 
Non- 
Redundant 
Proteins) 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P 

VALUE 


ACCESSION 


DESCRIPTION 


P 

VALUE 


346 


Z66242 


H.sapiens CpG 
island DNA 
genomic Msel 
fragment, clone 
84a4, reverse read 
cpg84a4.rtla. 


4.80E-26 


CEC35A5_8 


Caenorhabditis elegans 
cosmid C35A5, complete 
sequence; C35A5;8; 
CDNA EST yk3 1 f6;5 
comes from this gene; 
cDNA EST yk38hl;3 
comes from this gene; 
cDNA EST yk38hl ;5 
comes irom mis gene; 


7.70E- 
19 


347 


L25331 


Rattus norvegicus 
lysyl hydroxylase 
mRNA, complete 
cds. 


3.90E-26 


LYSH CHIC 
K 


PROCOLLAGEN- 
LYSINE.2- 

OXOGLUTARATE 5- 
DIOXYGENASE 
PRECURSOR (EC 
1.14.1 1.4) (LYSYL 
HYDROXYLASE)>PIR 
Z.A2374Z procollagen- 
lysine 5-dioxygenase (EC 
1.14. 11.4) precursor - 

chicken>GP:CHKLYH__ 
1 Chicken lysyl 

hydroxylase mRNA, 
complete cds 


1.10E- 
43 


348 




i-rrosupniia 

melanogaster 
(subclone 2 d7 
from PI DS04260 
(D68)) DNA 
sequence, 
complete 
sequence. 




CbLC52B9_2 


Caenorhabditis elegans 
cosmid C52B9; Coded 
for by C; elegans cDNA 
cml ld6; weakly similar 
to S; cervisiae PTM1 
precursor (SP:P32857) 


8.40E- 
29 


349 


U78082 


Human RNA 
polymerase 
transcriptional 
regulation 
mediator (h- 
MED6) mRNA, 
complete cds. 


2.30E-26 


HSU78082_1 


Human RNA polymerase 
transcriptional regulation 
mediator (h- MED6) 
mRNA, complete cds; H- 
Med6p 


1.50E- 
16 


350 


U43381 


Human Down 
Syndrome region 
of chromosome 
21 DNA. 


2.10E-28 


HSMRNAEB 
1 




H;sapiens genomic DNA, 
integration site for 
Epstein-Barr virus; 
Hypothetical protein 


0.18 
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Nearest 
Neighbor 
(BlastN vs. 
Genbank) 


Nearest 
Neighbor 
(BlastX vs. 
Non- 
Redundant 
Proteins) 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P 

VALUE 


ACCESSION 


DESCRIPTION 


P 

VALUE 


351 


D50416 


Mouse mRNA for 
AREC3, 
complete cds. 


2.50E-29 

mm * V 1 m mm ^ 


A29947 


nro^tafflandtn- 

VJtU^jltll ivl ill 

endoperoxide synthase 
(EC 1.14.99.1) precursor 

sheep>GP:SHPCOXA_l 

Sheep prostaglandin 
endoperoxide synthetase 
(cyclooxygenase), 
complete cds; 
Cyclooxygenase 
orecursor (EC 1 ■ 1 4 # 99* 1 ^ 


U.Ol 


352 


U85193 


Human nuclear 
factor I-B2 
(NFIB2) mRNA, 
complete cds. 


2.20E-29 


CFU30222J 


Crithidia fasciculata fully 
edited ATPase subunit 6 
(MURF4) mRNA, partial 
cds; Cryptogene 


0.53 


353 


Z92826 


Caenorhabditis 
elegans DNA *** 
SEQUENCING 
IN PROGRESS 
*** from clone 
C18D11;HTGS 

JJlIaoC 1 . 


1.10E-30 


SPAC1B3_5 


S;pombe chromosome I 
cosmidclB3; 
Hypothetical protein; 
SPAC1B3;05, probable 
transcriptional regulator, 
len:630aa, similar eg; to 

NOT3__YEAST, P06102, 
general negative 
regulator, 


3.20E- 
35 


354 


L09604 


Homo sapiens 
differentiation- 
dependent A4 
protein mRNA, 
complete cds. 


3.70E-32 


PVU72769_1 


Phaseolus vulgaris 
PvPRP-12(Pvprpl-12) 
mRNA, partial cds; 
Similar to cell wall 
proline rich 

protein>GP:PVU72769_ 
1 Phaseolus vulgaris 
PvPRP-12 (Pvprpl-12) 
mRNA, partial cds; 
Similar to cell wall 
proline rich protein 


0.00049 
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(BlastN vs. 
Genbank) 


Nearest 
Neighbor 
(BlastX vs. 
Non- 
Redundant 
Proteins) 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P 

VALUE 


ACCESSION 


DESCRIPTION 


P 

VALUE 


355 


B42455 


HS-1055-B2- 
G03-MR.abi CIT 
Human Genomic 
Sperm Library C 

T-TntYi a cji"nif*nc 

I±\Jlli\J odpiCllo 

genomic clone 
Plate=CT 777 
Col=6 Row=N. 


1.30E-32 


CELT05H4__8 


Caenorhabditis elegans 
cosmid T05H4; Similar 
to the beta transducin 
family; coded for by C; 
elegans cDNA 
ykl56ell;3; coded for by 
C; elegans cDNA 
ykl4c8;3; coded for by 
C; elegans cDNA 


6.90E- 
14 


356 


AF001905 


Homo sapiens 
cosmids E079, 

R0090 and A 8 

from Xq25 X- 
linked 

lymphoproliferati 
ve disease gene 
candidate region, 

sequence. 


1.80E-33 


138344 


titin - human 


1 


357 


E03743 


DNA sequence 
including male 
hormone 
dependent gene 
derived from 
hamster 
frankorgan. 


1.10E-34 


CELC03A7__2 


Caenorhabditis elegans 
cosmid C03A7; Weak 

sitnilaritv to ^pmtrvnin 

receptors 


0.59 


358 


U31199 


Human laminin 
gamma2 chain 
gene (LAMC2), 
exon 22 and 
flanking 
sequences. 


1.20E-35 


B44018 


laminin B2t chain - 
human>GP:HSLAMB2T 
B_l H;sapiens mRNA 
for laminin 


1 .20E- 
14 


359 


D 14678 


Human mRNA 
for kinesin- 
related protein, 
partial cds. 


2.00E-36 


D49544J 


Mouse mRNA for 
KIFC1, complete cds 


1 .20E- 
23 
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Nearest 
Neighbor 
(BlastN vs. 
Genbank) 


Nearest 
Neighbor 
(BlastX vs. 
Non- 
Redundant 
Proteins) 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


VALUE 


ACCESSION 


DESCRD7TION 


P 

VALUE 


360 


A£>UUU4zj 


rorcme ujna tor 
endopeptidase 
24.16, exon 16 
and complete cds. 


8.20E-38 


POL4 DROM 
E 


RETROVIRUS- 
RELATED POL 
POLYPROTEIN 
(PROTEASE (EC 
3.4.23.-); REVERSE 
TRANSCRIPTASE (EC 
2.7.7.49); 

ENDONUCLEASE) 
(TRANSPOSON 
412)>PIR1:GNFF42 
retrovirus-related pol 
polyprotein - fruit fly 
(Drosophila 

melanogaster) transposon 
412>GP:DMRT412G 4 


0.65 


361 


U39875 


Rattus norvegicus 
EF-hand Ca2+- 
Dinaing protein 
p22 mRNA, 
complete cds. 


8.80E-42 


156333 


apolipoprotein B - rat 
(fragment)>GP:RATAP 
OLPB_l Rattus 
norvegicus (clone rb9E) 
apolipoprotein B apoB 
mRNA, 3' end 


0.23 


362 


L09647 


Rattus nnrv^cnni*; 

hepatocyte 
nuclear factor 3a 
(HNF-3 beta) 
mRNA, complete 
cds. 




WlSHR PAT 

niN jjd ysj\ i 


hp p a TnrvTF 
ntiJr /\ l vJL/ 1 1 n 

NUCLEAR FACTOR 3- 

BETA (HNF- 

3B)>GP:RATHNF3B_1 

Rattus norvegicus 

hepatocyte nuclear factor 

3a (HNF-3 beta) mRNA, 

complete 

cds>TFD:TFDP01611 - 
Polypeptides entry for 
factor HNF-3 (beta) 


O.lUfc,- 

25 


363 


D25538 


Human mRNA 
for KIAA0037 
gene, complete 
cds. 


4.10E-43 


CELC34D4 1 
2 


Caenorhabditis elegans 
cosmid C34D4 


0.018 



255 



Docket No. 1480P 

Table 2 



{ 

s 


Nearest 
Neighbor 
(BlastN vs. 
Genbank) 


Nearest 
Neighbor 
(BlastX vs. 
Non- 
Redundant 
Proteins) 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P 

VALUE 


ACCESSION 


DESCRIPTION 


P 

VALUE 


364 


Z56764 


H.sapiens CpG 
island DNA 
genomic Msel 

13f7, reverse read 
cpgl3f7.rtla. 


1.40E-43 


S75263 


hypothetical protein - 
Synechocystis sp. (PCC 
6803)>GP:D90904_29 
bynecnocystis sp; 
PCC6803 complete 
genome, 6/27, 630555- 
781448; Hypothetical 
protein; ORF ID:sI10983 


0.0028 


365 


AC002636 


*** 

SEQUENCING 
IN PROGRESS 
*** Drosophila 

ill vlaiiugclo LCI 

(subclone 2_g4 
fromPlDS03323 
(D127))DNA 
sequence; HTGS 
phase 2. 


8.40E-44 


DMU95760_1 


Drosophila melanogaster 
strawberry notch (sno) 
mRNA, complete cds; 
Notch pathway 
component; nuclear 
protein 


3.40E- 
51 


366 


J05499 


Rattus norvegicus 
L-glutamine 
amidohydrolase 
mRNA, complete 
cds. 


8.00E-44 


GLSL RAT 


GLUTAMTNASF 
LIVER ISOFORM 
PRECURSOR (EC 
3.5.1.2) 

(GLS)>GP:RATGAHJ 
Rattus norvegicus L- 
glutamine 

amidohydrolase mRNA, 
complete cds 


O.VJUJQ- 

29 


367 


U95760 

— 


Drosophila 
melanogaster 
strawberry notch 
(sno) mRNA, 
complete cds. 


S.00E-45 


DMU95760_1 


Drosophila melanogaster 
strawberry notch (sno) 
mRNA, complete cds; 
Notch pathway 
component; nuclear 
protein 


4.80E- 
45 
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mmmwm 

i 

IIIIJIS 


Nearest 
Neighbor 
(BlastN vs. 
Genbank) 


Nearest 
Neighbor 
(BlastX vs. 
Non- 
Redundant 
Proteins) 


SEQ 


ACCESSION 


DESCRIPTION 


F 

VALUE 


ACCESSION 


DESCRIPTION 


P 

VALUE 


368 


L10106 


Mus musculus 
protein tyrosine 
phosphate 
mRNA, complete 
cds. 


4.10E-45 


PTPK HUMA 
N 


PROTEIN-TYROSINE 
PHOSPHATASE 

KAPPA PRECURSOR 
(EC 3.1.3.48) (R-PTP- 
KAPPA)>GP:HSPTPKA 
PJ H;sapiens mRNA for 
phosphotyrosine 
phosphatase kappa; 
Human phosphotyrosine 
phosphatase kappa 


4.70E- 
16 


369 


D17218 


Human HepG2 3' 
region Mbol 
cDNA, clone 
hmd3g02m3. 


9.40E-47 


MMU53563J 


Mus musculus Brgl 
mRNA, partial cds; N- 
terminal region of the 
protein 


0.00012 


370 


U78310 


Homo sapiens 
pescadillo 
mRNA, complete 

COS. 


8.10E-48 


HSU78310J 


Homo sapiens pescadillo 
mRNA, complete cds 


1.10E- 
21 


371 


AC000399 


Genomic 
sequence from 
Mouse 9, 
complete 
sequence. 


7.40E-48 


KIP2_YEAST 


KINESIN-LEKE 
PROTEIN 
KIP2>PIR1:C42640 
kinesin-related protein 
KIP2 - yeast 
(Saccharomyces 
cerevisiae)>GP:SCKIP2 
aVI_z o,cerevisiae rbr4 
and KIP2 genes encoding 
PEP4 proteinase (partial) 
and kinesin-related 
protein 

KIP2>GP:SCLACHXVI 
17 S;cerev 


0.14 


372 


AC002327 


*** 

SEQUENCING 
IN PROGRESS 
*** Genomic 
sequence from 
Mouse 7; HTGS 
phase 1, 3 
unordered pieces. 


1.40E-48 


CHKC1A205 
1 


Chicken alpha-2 type-1 
collagen; amino acids -16 
to 3; Precollagen alpha-2 


0.024 
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(BlastN vs, 
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SEQ 
ID 



ACCESSION 



DESCRIPTION 



373 



X67016 



374 



LI 0409 



U01139 



P 

VALUE 



H.sapiens mRNA 
for amphiglycan. 



Mouse fork head 
related protein 
(HNF-3beta) 
mRNA, complete 
cds. 



Mus musculus 
B6D2F1 clone 
2C11B mRNA. 



Nearest 
Neighbor 
(BlastX vs. 
Non- 
Redundant 
Proteins) 



ACCESSION 



9.00E-49 



1.50E-49 



1 .20E-49 



DESCRIPTION 



P 

VALUE 



CED2085 2 



MMU04197 1 



Caenorhabditis elegans 
cosmid D2085, complete 
sequence; D2085;l; 
Similar to glutamine- 
dependent carbamoyl- 
phosphate synthase, 
aspartate 

carbamoyltransferase, 
dihydroorotase; cDNA 
EST 

cm 1 6f3>GP:CED2085_2 
Caenorhabditis elegans 
cosmid D2085: D 



0.14 



SPBC3D5 14 



Mus musculus HNF3 
beta transcription factor 
(HNF3b) mRNA, partial 
cds; Sequence of this 
partial cDNA begins in 
the first third of the 
conserved 

HNF3/forkhead DNA 

binding domain 

S;pombe chromosome II 
cosmid c3D5; Unknown; 
SPBC3D5;14c, 
unknown; partial; serine 
rich, len:309aa, similar 
eg; to YNL283C, 
YN23_YEAST,P53832, 
hypothetical 52;3 kd 
protein, (503aa), 



1.20E- 
30 



0.00091 



376 



Z82170 



Human DNA 
sequence from 

AC 326L13 
containing brain- 
4 mRNA ESTs 
and polymorphic 
CA repeat. 



9.00E-50 



BSU55043 3 



Bacillus subtilis plasmid 
pPOD2000 Rep, RapAB, 
RapA, ParA, ParB, and 
ParC genes, complete 
cds; ORF3 



0.025 
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Nearest 
Neighbor 
(BlastN vs. 
Genbank) 


Nearest 
Neighbor 
(BlastX vs. 
Non- 
Redundant 
Proteins) 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P 

VALUE 


ACCESSION 


DESCRIPTION 


P 

VALUE 


377 


Z99289 


Human DNA 
sequence *** 

SEQUENCING 
IN PROGRESS 
*** from clone 
142L7; HTGS 
phase 1. 


7.70E-50 


A64431 


hypothetical protein 
MJ1050 - 
Methanococcus 
jannaschii>GP:MJU6754 
8_2 Methanococcus 
jannaschii from bases 
986219 to 996377 
(section 90 of 150) of the 
complete genome; M; 
jannaschii predicted 
coding region MJ1050; 
Identified by GeneMark; 
putativ 


5.60E- 
05 


378 


X98260 


H.sapiens mRNA 
for M-phase 
phosphoprotein, 
mpplL 


6.20E-50 


ZRF1 MOUS 
E 


ZUOTIN RELATED 
FACTOR>GP:MMU532 
08_1 Mus musculus 
zuotin related factor 
(ZRF1) mRNA, complete 
cds; Similar to DnaJ 
encoded by GenBank 
Accession Number 
LI 6953 


3.90E- 
30 


379 


M18981 


Human prolactin 
receptor- 
associated protein 
(PRA) gene, 
complete cds. 


9.00E-52 


S106 HUMA 
N 


CALCYCLIN 
(PROLACTIN 
RECEPTOR 
ASSOCIATED 
PROTEIN) (PRA) 
(GROWTH FACTOR- 
INDUCIBLE PROTEIN 
2A9) (SI 00 CALCIUM- 
BINDING PROTEIN 
A6)>PIR1:BCHUY 
calcyclin - 

human>GP:HUMCACY 
_1 Human calcyclin 
gene, complete 
cds>GP:HUMCACYA_l 
Human prolactin recept 


8.80E- 
24 


380 


AB006622 


Homo sapiens 
mRNA for 
KIAA0284 gene, 
partial cds. 


1.60E-53 


S33015 


hypothetical protein - 
human herpesvirus 4 


0.00088 
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1 
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■>-' . ■- v.'- • - 


Nearest 
Neighbor 
(BlastN vs. 
Genbank) 


Nearest 
Neighbor 
(BlastX vs. 
Non- 
Redundant 
Proteins) 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P 

VALUE 


ACCESSION 


DESCRIPTION 


P 

VALUE 


381 


U53225 


Human sorting 
nexin 1 (SNX1) 
mRNA, complete 
cds. 


1.80E-55 


G02522 


sorting nexin 1 - 
human>GP:HSU53225__l 
Human sorting nexin 1 
(SNXl)mRNA, 
complete cds 


9.20E- 
50 


382 


Z92844 


Human DNA 
sequence from 

PAC 435C23 on 

chromosome X. 
Contains ESTs. 


6.50E-56 


D14487J 


Lentinus edodes 
Le;MFBl mRNA, 
complete cds 


1 


383 


D87450 


Human mRNA 
forKIAA0261 
gene, partial cds. 


4.30E-56 


D87450J 


Human mRNA for 
KIAA0261 gene, partial 
cds; Similar to 
D;melanogaster parallel 
sister chromatids protein 


4.30E- 
30 


384 


AC002301 


5fC 5jc 

SEQUENCING 
IN PROGRESS 
*** Human 
chromosome 
+16plL2 BAC 
clone CIT987SK- 
A-328A3; HTGS 
phase 2, 1 
ordered pieces. 


9.80E-57 


S62328 


kinesin-like DNA 
binding protein KID - 
human>GP:HUMKIDJ 
Human mRNA for Kid 
(kinesin-like DNA 
binding protein), 
complete cds 


2.60E- 
27 


385 


L29766 


Homo sapiens 
epoxide hydrolase 
(EPHX) gene, 
complete cds. 


7.30E-57 


HSBCTCF4J 


Homo sapiens mRNA for 
hTCF-4 


2.30E- 
05 


386 


U58884 


Mus musculus 
SH3 -containing 
protein SH3P7 
mRNA, complete 
cds. similar to 
Human Drebrin. 


3.30E-58 


MMU58884J 


Mus musculus SH3- 
containing protein 
SH3P7 mRNA, complete 
cds; similar to Human 
Drebrin; SH3-containing 
protein; similar to human 
drebrin 


6.00E- 
43 


387 


Y15054 


Rattus norvegicus 
mRNA for 70 
kDa tumor 
specific antigen, 
partial. 


9.50E-59 


RNY15054J 


Rattus norvegicus mRNA 
for 70 kDa tumor specific 
antigen, partial; 70 kD 
tumor-specific antigen 


4.70E- 
45 
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Nearest 
Neighbor 
(BlastN vs. 
Genbank) 


Nearest 
Neighbor 
(BlastX vs. 
Non- 
Redundant 
Proteins) 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P 

VALUE 


ACCESSION 


DESCRIPTION 


P 

VALUE 


388 


AC000406 


*** 

SEQUENCING 
IN PROGRESS 
*** Human 
Chromosome 1 1 
overlapping pacs 
pDJ235klO and 
pDJ239b22; 
HTGS phase 1, 
l / unordered 
pieces. 


7.40E-59 


<NONE> 


<NONE> 


<NONE 
> 


389 


L42612 


Homo sapiens 
keratin 6 isoform 
K6f (KRT6F) 
mRNA, complete 
cds. 


3.60E-59 


KRHUEA 


keratin, type II 
cytoskeletal - human 
(fragment)>GP:HSKER 
A_l Human messenger 
fragment encoding 
cytoskeletal keratin (type 
II); mRNA from cultured 
epidermal cells from 
human 

foreskin>GP:HUMKER5 
6K_1 Human 56k 
cytoskeletal type II 
keratin mRNA 


7.60E- 
30 


390 


L29766 


Homo sapiens 

Pttrtvi /"i fi~\ loco 

cpuAiuc nyuruiaSc 

(EPHX) gene, 
complete cds. 


2.70E-60 


EGR2_HUMA 

XT 


EARLY GROWTH 

DCCDAXTOC TYTJ ATT? T\ T O 

RbbrONbE PROTEIN 2 
(EGR-2) (KROX-20 
PROTEIN) 

(AT591)>GP:HUMEGR 
2A_1 Human early 
growth response 2 
protein (EGR2) mRNA, 
complete 

cds>TFD:TFDP00485 - 
Polypeptides entry for 
factor Egr-2 


7.80E- 
06 


391 

1 


L08758 


Mus musculus 
homeobox protein 
(Hox A 10) gene, 
5' end of cds. 


1 .40E-60 


PAALGYGE 
NJ 


P;aeruginosa algY gene; 
Alginate lyase 


0.00031 
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Nearest 
Neighbor 
(BlastN vs. 
Genbank) 


Nearest 
Neighbor 
(BlastX vs. 
Non- 
Redundant 
Proteins) 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P 

VALUE 


ACCESSION 


DESCRIPTION 


P 

VALUE 


392 


129058 


Sequence 3 from 
patent uo 
5576423. 


4.20E-61 


JC5106 


stromal cell-derived 

human>GP:D50645_l 
Human mRNA for SDF2, 
complete cds; Stroma 
cell-derived factor-2 


1.50E- 


393 


129058 


Sequence 3 from 
patent US 
5576423. 


4.20E-61 


JC5106 


stromal cell-derived 
factor 2 - 

human>GP:D50645__l 
Human mRNA for SDF2, 
complete cds; Stroma 
cell-derived factor-2 


1.50E- 
32 


394 


U46067 


Capra hircus 
beta-mannosidase 
mRNA, complete 
cds. 


1.90E-62 


CHU46067_l 


Capra hircus beta- 
mannosidase mRNA, 
complete cds 


2.70b- 
39 


395 


U40747 


Mus musculus 
formin binding 
protein 1 1 
mRNA, partial 
cds. 


6.90E-63 


S64713 


formin binding protein 
1 1 - mouse 

(fragment)>GP:MMU40 
747_1 Mus musculus 
formin binding protein 
1 1 mKiNA, partial cas, 
FBP 1 1 ; Formin binding 
protein 11; tandem 
WWP/WW domains 
separated by 15 amino 
acid linker 


3.00E- 
46 




M36164 


Human 

glyceraldehyde-3- 
phosphate 
dehydrogenase 
mRNA, 3 f flank. 


UOE-63 


BHT1UL_12 


Bovine herpesvirus type 

1 UL22-35 genes; 
UL26;5>GP:BHU31809_ 

2 Bovine herpesvirus 1 
maturational proteinase 
(UL26) gene, complete 
cds, and scaffold protein 
(UL26;5) gene, complete 
cds 


0.003 
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Nearest 
Neighbor 
(BlastN vs. 
Genbank) 


Nearest 
Neighbor 
(BlastX vs. 
Non- 
Redundant 
Proteins) 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


p 

VALUE 


ACCESSION 


DESCRIPTION 


P 

VALUE 


397 




ri.sapiens 
NTRK1 gene, 
exon 17. 


*7 inn &z 


MMUJyUou__l 


Mus musculus 
glucocorticoid receptor 
interacting protein 1 
(GRIP1) mRNA, 
complete cds; Hormone- 
dependent interaction 
with hormone binding 
domains of steroid 
receptors; transactivation 


0.0054 


398 


U17901 


Rattus norvegicus 
phospholipase A- 
2-activating 
protein (plap) 
mKJNA, complete 
cds. 


2.70E-70 


JC4239 


phospholipase A2- 
activating protein - rat 


8.40E- 
17 


399 


D 12646 


Mouse kif4 
mRNA for 
microtubule- 
based motor 
protein KIF4, 
complete cds. 


1.70E-74 


KIF4 MOUS 
E 


KINESIN-LIKE 
PROTEIN 
KIF4>PIR2:A54803 
microtubule-associated 
motor KIF4 - 
mouse>GP:MUSKIF4_l 
Mouse kif4 mRNA for 
microtubule-based motor 
protein KIF4, complete 
cds; ATP-binding site: 
base980-1037, motor 
domain: base732-1781, 
aipna-nencai co 


1.10E- 
44 


400 


AF007860 


Xenopus laevis 
xl-Mago mRNA, 
complete cds. 


4.60E-75 


AF007862J 


Mus musculus mm-Mago 
mRNA, complete cds; 
Similar to Drosophila 
melanogaster Mago 
protein 


6.50E- 
68 


401 I 


145565 


Sequence 15 from 
patent US 
5637463. 


2.30E-82 


RMJ57391J 


Rattus norvegicus FceRI 
gamma-chain interacting 
protein SH2- B (SH2-B) 
mRNA, complete cds; 
Putative FceRI gamma 
ITAM interacting 
protein; SH2 domain- 
containing protein B; 
Method: conceptual 


9.90E- 
42 
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Nearest 
Neighbor 
(BlastN vs. 
Genbank) 


Nearest 
Neighbor 
(BlastX vs. 
Non- 
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Proteins) 


SEQ 


ACCESSION 


DESCRIPTION 


P 

VALUE 


ACCESSION 


DESCRIPTION 


P 

VALUE 


402 


U29156 


Mus musculus 
epsl5R mRNA, 
complete cds. 


1.00E-85 


MMU29156J 


Mus musculus epslSR 
mRNA, complete cds; 
Involved in signaling by 
the epidermal growth 
factor receptor; Method: 
conceptual translation 
supplied by author 


4.90E- 
62 




U70139 


Mus musculus 
putative CCR4 
proiein miviN/\, 
partial cds. 


1 .OOE-85 


MMU70139_1 


Mus musculus putative 
CCR4 protein mRNA, 

panidi cus, oinuidr 10 

yeast transcription factor 
CCR4; transcriptional 
readthrough occurs with 
transcription being 
initiated at the IAP and 
continues 


7.20E- 
66 


404 


U82626 


Rattus norvegicus 
basement 
membrane- 
associated 
chondroitin 
proteoglycan 
Bamacan mRNA, 
complete cds. 


7.60E-96 


RNU82626J 


Rattus norvegicus 
basement membrane- 
associated chondroitin 
proteoglycan Bamacan 
mRNA, complete cds; 
Chondroitin sulfate 
proteoglycan; CSPG 


8.20E- 
58 
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Nearest Neighbor (BlastN vs. Genbank) 


Nearest Neighbor (BlastX vs. Non-Redundant Proteins) 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P VALUE 


ACCESSION 


DESCRIPTION 


P VALUE 


405 


L09604 


Homo sapiens 
differentiation- 
dependent A4 
protein mRNA, 
complete cds. 


2.00E-35 


<NONE> 


<NONE> 


<NONE> 


406 


AB000516 


Homo sapiens 
mRNA for DSIF 
pi 60, complete 
cds 


0.41 


POLG_TUMVQ 


GENOME 
POLYPROTEIN 
(CONTAINS: N- 
TERMINAL 
PROTEIN; HELPER 
COMPONENT 
PROTEINASE (EC 
3.4.22.-) (HC-PRO); 
42-50 KD PROTEIN; 
CYTOPLASMIC 
INCLUSION 
PROTEIN (CI); 6 KD 
PROTEIN; VPG 
PROTEIN; 
NUCLEAR 
INCLUSION 
PROTEIN A (NI-A) 


2.9 


407 


Z94753 


Human DNA 
sequence from 
PAC 465G10on 
chromosome X 
contains Menkes 
Disease (ATP7A) 
putative Cu-H-- 
transporting P- 
type ATPase 
exons 22, 23 and 
STS 


0 004 


<NONE> 


<NONE> 


<NONE> 


408 


AB011123 


Homo sapiens 
mRNA for 
KIAA0551 
protein, partial 
cds 


0 


MI15_CAEEL 


Q23356 

caenorhabditis 

elegans. 

serine/threonine- 
protein kinase mig-15 
(ec 2.7.1.-). 11/98 


2.00E-51 


409 


D17218 


Human HepG2 3' 
region Mbol 
cDNA, clone 
hmd3g02m3 


e-123 


NARGBACSU 




NITRATE 
REDUCTASE 
ALPHA CHAIN (EC 
1.7.99.4) 


9.9 
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Nearest Neighbor (BlastX vs. Non-Redundant Proteins) 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P VALUE 


ACCESSION 


DESCRIPTION 


P VALUE 


410 


M95098 


Bos taurus 
lysozyme gene 
(cow 2), complete 
cds 


1.1 


HAIR_MOUSE 


HAIRLESS 
PROTEIN 


8.00E-10 


411 


Z60048 


H.sapiens CpG 
DNA, clone 
1 87a9, reverse 
read 

cpgl87a9.rtla. 


4.00E-54 


HN3B_MOUSE 


HEPATOCYTE 
NUCLEAR FACTOR 
3-BETA (HNF-3B) 


4.00E-21 


412 


Z48975 


P.magnus gene 
for protein urPAB 


0.014 


YPT2_CAEEL 


HYPOTHETICAL 
21.6 KD PROTEIN 
F37A4.2 IN 
CHROMOSOME III 


2.00E-12 


413 


AJ001296 


Notophthalmus 
viridescens 
mRNA for 
cytokeratin 8 


0.37 


YA53_SCHPO 


HYPOTHETICAL 
24.2 KD PROTEIN 
C13A11.03IN 
CHROMOSOME I 


5.00E-21 


414 


J03831 


Xenopus laevis 
(clone pXEC1.3) 
C protein mRNA, 
complete cds. 


0.37 


PDR5_YEAST 


SUPPRESSOR OF 
TOXICITY OF 
SPORIDESMIN 


3.3 


415 


AB007157 


Homo sapiens 
gene for 

ribosomal protein 
S21, partial cds 


e-142 


RS21_HUMAN 


40S RIBOSOMAL 
PROTEIN S21 


0.002 


416 


X86340 


H. sapiens C7 
gene, exon 13 


3.3 


STC_DROME 


SHUTTLE CRAFT 
PROTEIN 


4.3 


417 


U 12404 


Human Csa-19 
mRNA, complete 
cds. 


0 


R10A_PIG 


60S RIBOSOMAL 
PROTEIN LI OA 
(CSA-19) 
(FRAGMENT) 


9.00E-57 


418 


U95102 


Xenoous laevis 
mitotic 

phosphoprotein 
90 mRNA, 
complete cds 


8.00E-08 


<NONE> 


<NONE> 


<NONE> 


419 


M80198 


Human FKBP-12 
pseudogene, clone 
lambda-512, 5' 
flank and 
complete cds. 


5.00E-14 


RC01_NEUCR 


TRANSCRIPTIONA 
L REPRESSOR RCO- 
1 


0.008 


420 


AF052573 


Homo sapiens 
DNA polymerase 
eta (POLH) 
mRNA, complete 
cds 


0 


<NONE> 


<NONE> 


<NONE> 
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Nearest Neighbor (BlastX vs. Non-Redundant Proteins) 


ID 


ACCESSION 


DESCRIPTION 


P VALUE 


ACCESSION 


DESCRIPTION 


P VALUE 


421 


AF035940 


Homo sapiens 
MAGOH mRNA, 
complete cds 


e-131 


MGN_DROME 


MAGO NASH I 
PROTEIN 


4.00E-39 


422 


AF054994 


Homo sapiens 
clone 23832 
mRNA sequence 


0.12 


<NONE> 


<NONE> 


<NONE> 


423 


U95098 


Xenopus laevis 
mitotic 

phosphoprotein 
44 mRNA, partial 
cds 


6.00E-05 


<NONE> 


<NONE> 


<NONE> 


424 


U95094 


Xenopus laevis 
XL-INCENP 
(XL-INCENP) 
mRNA, complete 
cds 


7.00E-07 


<NONE> 


<NONE> 


<NONE> 


425 


D43952 


Mouse gene for 
reticulocalbin, 
exonl and 
promoter region 


0.36 


<NONE> 


<NONE> 


<NONE> 


426 


X68553 


C.eleeans 
repetitive DNA 
sequence 


0.4 


TCB1 RABIT 


T-CELL RECEPTOR 
BETA CHAIN 
PRECURSOR (ANA 
11) 


0 11 

V.J. X 


427 


M83314 


Tomato 
phenylalanine 
ammonia lyase 
(pal) gene, 
complete cds and 
promoter region. 


3.3 


SMB2_HUMAN 


DNA-BINDING 
PROTEIN SMUBP-2 
(GLIAL F ACTOR- 1) 
(GF-1) 


0.65 


428 


AF070636 


Homo sapiens 
clone 24686 
mRNA sequence 


5.00E-23 


<NONE> 


<NONE> 


<NONE> 


429 


<NONE> 


<NONE> 


<NONE> 


IQGA_HUMAN 


RAS GTPASE- 
ACTIVATING-LIKE 
PROTEIN IQGAP1 
(PI 95) 


2.00E-06 


430 


AF068627 


Mus musculus 
DNA cytosine-5 

methyltransferase 
3B2 (Dnmt3b) 
mRNA, 
alternatively 
spliced, complete 
cds 


5.00E-04 


LOXl_LENCU 


LIPOXYGENASE 
(EC 1.13.11.12) 


9.9 
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Nearest Neighbor (BlastN vs. Genbank) 


Nearest Neighbor (BlastX vs. Non-Redundant Proteins) 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P VALUE 


ACCESSION 


DESCRIPTION 


P VALUE 


431 


AF020043 


Homo sapiens 
chromosome- 
associated 
polypeptide 


0 


YJH4_YEAST 


HYPOTHETICAL 
141.3 KD PROTEIN 
IN SCP160-MRPL8 
INTERGENIC 
REGION 


4.00E-16 


432 


K00046 


ross river virus 
26s subgenomic 
rna and junction 
region. 


0.12 


CUL2_HUMAN 


CULLIN HOMOLOG 
2 (CUL-2) 


7.4 


433 


AF005664 


Homo sapiens 
properdin (PFC) 
gene, complete 
cds 


0.005 


UL88_HCMVA 


PROTEIN UL88 


5.8 


434 


Z70705 


H.sapiens mRNA 
(fetal brain cDNA 
com5) 


2.00E-05 


PH87_YEAST 


INORGANIC 
PHOSPHATE 
TRANSPORTER 
PH087 


1.5 


435 


U29156 


Mus musculus 
epsl5R mRNA, 
complete cds. 


e-125 


EP15_HUMAN 


EPIDERMAL 
GROWTH FACTOR 
RECEPTOR 
SUBSTRATE 
SUBSTRATE 15 
(PROTEIN EPS 15) 
(AF-1P PROTEIN) 


1.00E-13 


436 


AE000750 


Aquifex aeolicus 
section 82 of 109 
of the complete 
genome 


0.37 


<NONE> 


<NONE> 


<NONE> 


437 


U49169 


Dictyostelium 
discoideum V- 
ATPase A subunit 
(vatA) mRNA, 
complete cds 


0.12 


VCAP_HSV6U 


MAJOR CAPSID 
PROTEIN (MCP) 


5.6 


438 


AF032871 


Homo sapiens 
uncoupling 
protein 3 (UCP3) 
gene, exon 1 and 
partial exon 2 


0.13 


WEEl_SCHPO 


MITOSIS 
INHIBITOR 
PROTEIN KINASE 
WEE1 (EC 2.7.1.-) 


3.7 


4_>y 


AB000425 


Porcine DNA for 
endopeptidase 
24.16, exon 16 
and complete cds 


4.00E-32 


<NONE> 


<NONE> 


<NONE> 


440 


U51037 


Mus musculus 1 1- 
zinc-finger 
transcription 
factor 


0.04 


<NONE> 


<NONE> 


<NONE> 
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Nearest Neighbor (BlastN vs. Genbank) 


Nearest Neighbor (BlastX vs. Non-Redundant Proteins) 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P VALUE 


ACCESSION 


DESCRIPTION 


P VALUE 


441 


AF032456 


Homo sapiens 
ubiquitin 
conjugating 
enzyme G2 


e-110 


<NONE> 


<NONE> 


<NONE> 


442 


AF009288 


Homo sapiens 
clone HEB8 Cri- 
du-chat region 
mRNA 


2.00E-14 


LMG1_HUMAN 


LAMININ GAMMA- 
1 CHAIN 
PRECURSOR 
(LAMININ B2 
CHAIN) 


8.1 


443 


AF024578 


Homo sapiens 
type-1 protein 
phosphatase 
skeletal muscle 
glycogen 
targeting subunit 
(PPP1R3) gene, 
exon 4, and 

compieie t/Ub 


1.1 


<NONE> 


<NONE> 

m 


<NONE> 


444 


M24486 


Human prolyl 4- 
hydroxylase alpha 
subunit mRNA, 
complete cds, 
clone PA-11. 


0 


DACHA 


<NONE> 


4.00E-58 


445 


X96400 


P.tetraurelia 

alnha-Sl O *?ene 

C4.lL/llCt J 1 IS UjOUv 


0.37 


<NONE> 


<NONE> 


<NONE> 


446 


<NONE> 


<NONE> 


<NONE> 


<NONE> 


<NONE> 


<NONE> 


447 


X84996 


X.laevis mRNA 
for selenocysteine 

a T\ "VTA j • 

tRNA acting 
factor (Staf) 


0.12 


POLMLVRD 


POL POLYPROTEIN 
(PROTEASE (EC 
3.4.23.-); REVERSE 
TRANSCRIPTASE 
(EC 2.7.7.49); 
RIBONUCLEASE H 

(EC i. 1.20.4)) 


2.00E-08 


448 


AFO 19980 


Dictyostelium 
discoideum ZipA 
(zipA) gene, 
partial cds 


3.4 


HMDL_BRAFL 


HOMEOBOX 
PROTEIN DLL 
HOMOLOG 


0.23 


AAQ 


X78424 


D.carota (Queen 
Anne's Lace) 
Inv*Dc2 gene, 
3432bp 


0.38 


<NONE> 


<NONE> 


<NONE> 


450 


<NONE> 


<NONE> 


<NONE> 


<NONE> 


<NONE> 


<NONE> 


451 


X89886 


P.patens mRNA 
for 5- 

aminolevulinate 


1.1 


CKR6_HUMAN 


C-C CHEMOKINE 
RECEPTOR TYPE 6 
(C-C CKR-6) (CCR6) 


9.9 
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Nearest Neighbor (BlastN vs. Genbank) 


Nearest Neighbor (BlastX vs. Non-Redundant Proteins) 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P VALUE 


ACCESSION 


DESCRIPTION 


P VALUE 






dehydratase 










452 


U67471 


Methanococcus 
jannaschii section 
13 of 150 of the 
complete genome 


0.12 


YR72_ECOLI 


HYPOTHETICAL 
53.2 KD PROTEIN 
(ORF2) (RETRON 
EC67) 


5.8 


453 


AF060246 


Mus musculus 
strain C57BL/6 
zinc finger protein 
106 (Zfpl06) 
mRNA, H3a-a 
allele, complete 
cds 


1 .00E-62 


YOJ8_CAEEL 


HYPOTHETICAL 
51.6 KD PROTEIN 
ZK353.8 IN 
CHROMOSOME III 


1.7 


454 


U70667 


Human Fas-ligand 
associated factor 
1 mRNA, partial 
cds 


0 


YKB2_YEAST 


HYPOTHETICAL 
69.1 KD PROTEIN 
IN PUT3-CCE1 
INTERGENIC 
REGION 


3.00E-09 


455 


M95858 


Bos taurus 

• T"\ "VTA. 

recovenn mRNA, 
complete cds. 


0.35 


GIDA_MYCGE 


GLUCOSE 
INHIBITED 
DIVISION PROTEIN 
A 


1.4 


456 


U67594 


Methanococcus 
jannaschii section 
136 of 150 of the 
complete genome 


0.36 


<NONE> 


<NONE> 


<NONE> 


457 


X06747 


Human hnRNP 
core protein Al 


3.00E-31 


<NONE> 


<NONE> 


<NONE> 


458 


Z65575 


H.sapiens CpG 
DNA, clone 47c5 ? 
reverse read 
cpg47c5.rtla. 


1.3 


<NONE> 


<NONE> 


<NONE> 


459 


X88893 


C.jacchus intron 4 
of visual pigment 
gene 


5.00E-15 


<NONE> 


<NONE> 


<NONE> 


40U 


M57426 


Maize stripe virus 
RNA3 
nonstructural 
protein 


0.33 


DSC2_MOUSE 


DESMOCOLLIN 
2A/2B PRECURSOR 
(EPITHELIAL TYPE 
2 DESMOCOLLIN) 


6.5 


461 


X01638 


Yeast TEF 1 gene 
for elongation 
factor EF-1 alpha 


1.1 


PPOL_DROME 


POLY (ADP- 
RIBOSE) 

POLYMERASE (EC 
2.4.2.30) (PARP) 


3.5 
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Nearest Neighbor (BlastX vs. Non-Redundant Proteins) 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P VALUE 


ACCESSION 


DESCRIPTION 


P VALUE 


462 


M60064 


S.typhimurium 
glutamate 1- 
semialdehyde 
aminotransferase 
(hemL) gene, 
complete cds. 


LI 


EPB4_MOUSE 


EPHRIN TYPE-B 
RECEPTOR 4 
PRECURSOR (EC 
2.7.1.1 12) KINASE 2) 
(TYROSINE 
KINASE MYK-1) 


2.5 


463 


X51508 


Rabbit mRNA for 
aminopeptidase N 
(partial) 


0 36 


ACHG XENLA 


ACETYLCHOLINE 
RECEPTOR 
PROTEIN, GAMMA 
CHAIN 
PRECURSOR 


1 5 


464 


L10106 


Mus musculus 
protein tyrosine 
phosphate 

mRT^JA com nipt f* 

cds. 


2.00E-58 


VG13_BPML5 


GENE 13 PROTEIN 
(GP13) 


2.5 


465 


M77235 


Human cardiac 
tetrodotoxin- 
insensitive 
voltage-dependent 
sodium channel 
alpha subunit 
(HHl)mRNA, 
complete cds. 


3.8 


ZPB0C1 


<NONE> 


6.9 


466 


M58330 


C.maltosa 
autonomously 
replicating 
sequence. 


0.004 


EPB4_MOUSE 


EPHRIN TYPE-B 
RECEPTOR 4 
PRECURSOR (EC 
2.7.1.1 12) KINASE 2) 
(TYROSINE 
KINASE MYK-1) 


2.4 


467 


X51508 


Rabbit mRNA for 
aminopeptidase N 
(partial) 


0.35 


ACHG XENLA 


ACETYLCHOLINE 
RECEPTOR 
PROTEIN, GAMMA 
CHAIN 
PRECURSOR 


2.4 


468 


L10106 


Mus musculus 
protein tyrosine 
phosphate 
mRNA, complete 
cds. 


7.00E-59 


VGLI_PRVRI 


GLYCOPROTEIN 
GP63 PRECURSOR 


4.3 
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Nearest Neighbor (BlastX vs. Non-Redundant Proteins) 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P VALUE 


ACCESSION 


DESCRIPTION 


P VALUE 


469 


U65939 


Azotobacter 
vinelandii GTPase 
(ftsA) gene, 
partial cds, and 
ATP binding 
protein (ftsZ) 
gene, complete 
cds 


1.1 


TRUA_BACSP 


Q45557 bacillus sp. 
(strain ksm-64). trna 
pseudouridine 
synthase a (ec 
4.2.1.70) 
(pseudouridylate 
synthase i) 
(pseudouridine 
synthase i) (uracil 
hydrolyase). 1 1/98 


0.001 


470 


U51037 


Mus musculus 11- 
zinc-finger 
transcription 
factor 


0.041 


<NONE> 


<NONE> 


<NONE> 


471 


M32685 


Human platelet 
glycoprotein Ilia, 
exon 14. 


3.6 


<NONE> 


<NONE> 


<NONE> 


472 


U82691 


Phrynocephalus 
raddei CAS 
179770 NADH 

ffehvdro£rena*>e 

subunit 1 (ND1), 
partial cds, tRNA- 
Gln, tRNA-Ile 
and tRNA-Met, 
NADH 

dehydrogenase 
subunit 2 tRNA- 
Cys and tRNA- 
Tyr and c... 


1.1 


<NONE> 


<NONE> 


<NONE> 


473 


D85430 


Mouse Murrl 
mRNA, exon 


0.12 


EPA5_CHICK 


EPHRIN TYPE-A 
RECEPTOR 5 
PRECURSOR (EC 
2.7.1.112) 


2.5 


474 


U20661 


Dictyostelium 
discoideum 
unknown internal 
repeat protein 
gene, complete 

cds, and unknown 
orfl, orf2 and 
orD genes, partial 
cds 


0.36 


YHL1_EBV 


HYPOTHETICAL 
BHLF1 PROTEIN 


4.00E-04 
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Nearest Neighbor (BlastX vs. Non-Redundant Proteins) 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P VALUE 


ACCESSION 


DESCRIPTION 


P VALUE 


475 


X56537 


Human novel 
homeobox mRNA 
for a DNA 
binding protein 


0.04 


FA5JHUMAN 


COAGULATION 
FACTOR V 
PRECURSOR 
(ACTIVATED 
PROTEIN C 
COFACTOR) 


9.5 


476 


U32843 


Haemophilus 
influenzae Rd 
section 158 of 163 
of the complete 
genome 


5 


<NONE> 


<NONE> 


<NONE> 


477 


U67554 


Methanococcus 
jannaschii section 
96 of 150 of the 
complete genome 


0.36 


<NONE> 


<NONE> 


<NONE> 


478 


AB004244 


Narke japonica 
mRNA for Nj- 
synaphin lb, 
complete cds 


1.1 


NIAl_ORYSA 


NITRATE 
REDUCTASE 1 (EC 
1.6.6.1) (NR1) 


1.00E-07 


479 


AF075079 


Homo sapiens full 
length insert 
cDNA YQ80A08 


1.00E-12 


<NONE> 


<NONE> 


<NONE> 


480 


AE000723 


Aquifex aeolicus 
section 55 of 109 
of the complete 
genome 


1 


YKK0_YEAST 


HYPOTHETICAL 
67.5 KD PROTEIN 
IN APE1/LAP4- 
CWP1 INTERGENIC 
REGION 


9.1 


481 


X73902 


H.sapiens mRNA 
for nicein B2 
chain 


0 


LMG2_HUMAN 


LAMININ GAMMA- 
2 CHAIN 
PRECURSOR 


3.00E-93 


482 


U95094 


Xenopus laevis 
XL-INCENP 
(XL-INCENP) 
mRNA, complete 
cds 


3.00E-10 


P53_CRIGR 


CELLULAR TUMOR 
ANTIGEN P53 


5.7 


483 


ALO 10240 


Plasmodium 

falciparum DNA 
*** 

SEQUENCING 
IN PROGRESS 
*** from contig 
4-64, complete 
sequence 


1.2 


<NONE> 


<NONE> 


<NONE> 
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Nearest Neighbor (BlastX vs. Non-Redundant Proteins) 


SEQ 
ID 


A. \-> Ei o o 1 \J i N 




P VAT T IF 






P VAI I IF 


484 


U49919 


Arabidopsis 
thaliana lupeol 
synthase mRNA, 
complete cds 


0.54 


YA53_SCHPO 


HYPOTHETICAL 
24.2 KD PROTEIN 
C13A11.03IN 
CHROMOSOME I 


6.00E-10 


485 


AF077618 


Homo sapiens 
p73 gene, exon 3 


0.39 


MYOD_MOUSE 


MYOBLAST 
DETERMINATION 
PROTEIN 1 


2.1 


486 


AF054994 


Homo sapiens 
clone 23832 
mRNA sequence 


0.13 


<NONE> 


<NONE> 


<NONE> 


487 


U95102 


Xenopus laevis 
mitotic 

phosphoprotein 
90 mRNA, 
complete cds 


3.00E-10 


<NONE> 


<NONE> 


<NONE> 


488 


AF068627 


Mus musculus 

DNA cytosine-5 

methyltransferase 

3B2 (Dnmt3b) 

mRNA, 

alternatively 

spliced, complete 

cds 


5.00E-04 


ACE2_YEAST 


METALLOTHIONEI 
N EXPRESSION 
ACTIVATOR 


1.5 


489 


U95102 


Xenopus laevis 
mitotic 

phosphoprotein 
90 mRNA, 
complete cds 


3.00E-07 


RINI_PIG 


RIBONUCLEASE 
INHIBITOR 


0.19 


490 


L77886 


Human protein 
tyrosine 
phosphatase 
mRNA, complete 
cds 


1.00E-21 


VS48_TBRVS 


SATELLITE RNA 48 
KD PROTEIN 


1.6 


491 


U95098 


Xenopus laevis 
mitotic 

phosphoprotein 
44 mRNA, partial 
cds 


5.00E-04 


CRP3_LIMPO 


C-REACTIVE 
PROTEIN 3.3 
PRECURSOR 


3.5 


492 


U95094 


Xenopus laevis 
XL-INCENP 
(XL-INCENP) 
mRNA, complete 
cds 


8.00E-08 


EPA5_CHICK 


EPHRIN TYPE-A 

RECEPTOR 5 
PRECURSOR (EC 
2.7.1.112) 


2.7 


493 


U95094 


Xenopus laevis 

XL-INCENP 

(XL-INCENP) 


3.00E-09 


<NONE> 


<NONE> 


<NONE> 
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Nearest Neighbor (BlastX vs. Non-Redundant Proteins) 


SEQ 
ID 




DFSCRIPTION 


P VAT I JF 

IT V *\ 1— < \J 1~j 


ACCFSSION 


DFSCRIPTION 


P VAI I JF 






mRNA, complete 
cds 










494 


U28153 


Caenorhabditis 
elegans UNC-76 
(unc-76) gene, 
complete cds. 


0.37 


<NONE> 


<NONE> 


<NONE> 


495 


U95094 


Xenopus laevis 
XL-INCENP 
(XL-INCENP) 
mRNA, complete 
cds 


0.37 


NCPR_YEAST 


NADPH- 
CYTOCHROME 
P450 REDUCTASE 
(EC 1 .6.2.4) (CPR) 


7.00E-05 


496 


U95102 


Xenopus laevis 
mitotic 

phosphoprotein 
90 mRNA, 
complete cds 


0.013 


YMB3CAEEL 


PROBABLE 
INTEGRIN ALPHA 
CHAIN F54G8.3 
PRECURSOR 


3.3 


497 


U95102 


Xenopus laevis 
mitotic 

phosphoprotein 
90 mRNA, 
complete cds 


7.00E-07 


<NONE> 


<NONE> 


<NONE> 


498 


U95094 


Xenopus laevis 
XL-INCENP 
(XL-INCENP) 
mRNA, complete 
cds 


1.00E-10 


<NONE> 


<NONE> 


<NONE> 


499 


U95102 


Xenopus laevis 
mitotic 

phosphoprotein 
90 mRNA, 
complete cds 


2.00E-07 


VGLY_LYCVW 


GLYCOPROTEIN 

POLYPROTEIN 

PRECURSOR 

(CONTAINS: 

GLYCOPROTEINS 

Gl ANDG2) 


3.2 


500 


U95098 


Xenopus laevis 
mitotic 

phosphoprotein 
44 mRNA, partial 
cds 


8.00E-06 


HR78_DROME 


NUCLEAR 
HORMONE 
RECEPTOR HR78 
(DHR78) (NUCLEAR 
RECEPTOR 
XR78E/F) 


2.5 


501 


U95102 


Xenopus laevis 
mitotic 

phosphoprotein 
90 mRNA, 
complete cds 


9.00E-10 


MYSHBOVIN 


MYOSIN I HEAVY 
CHAIN-LIKE 
PROTEIN (MIHC) 
(BRUSH BORDER 
MYOSIN I) (BBMI) 


4.00E-04 
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Nearest Neighbor (BlastX vs. Non-Redundant Proteins) 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P VALUE 


ACCESSION 


DESCRIPTION 


P VALUE 


502 


U95094 


Xenopus laevis 
XL-INCENP 
(XL-INCENP) 
mRNA, complete 
cds 


2.00E-04 


BAL_HUMAN 


BILE-SALT- 
ACTIVATED 
LIPASE 

PRECURSOR (EC 
3.1.1.3) (EC 3.1.1.13) 
(BAL) (BILE-SALT- 
STIMULATED 
LIPASE) (BSSL) 
ESTERASE) 
(PANCREATIC 
LYSOPHOSPHOLIP 
ASE) 


2.6 


503 


AF080399 


Drosophila 
melanogaster 
mitotic 
checkpoint 
control protein 
kinase BUB 1 
CBubl) mRNA, 
complete cds 


1.1 


NAT1_YEAST 


N-TERMINAL 
ACETYLTRANSFER 
ASE 1 (EC 2.3.1.88) 


2.00E-23 


504 


U59706 


Gallus gallus 
alternatively 
spliced AMP A 
elutamate 
receptor, isoform 
GluR2 flop, 
(GluR2) mRNA, 
partial cds. 


0.014 


<NONE> 


<NONE> 


<NONE> 


505 


U95094 


Xenoous laevis 
XL-INCENP 
(XL-INCENP) 
mRNA, complete 
cds 


2.00E-05 


<NONE> 


<NONE> 


<NONE> 


506 


U95098 


Xenopus laevis 
mitotic 

phosphoprotein 
44 mRNA, partial 
cds 


2.00E-04 


<NONE> 


<NONE> 


<NONE> 


507 


AF 100661 


Caenorhabditis 
elegans cosmid 
H20E11 


0.38 


<NONE> 


<NONE> 


<NONE> 


508 


U95102 


Xenopus laevis 
mitotic 

phosphoprotein 
90 mRNA, 
complete cds 


3.00E-11 


CA1A_HUMAN 


COLLAGEN ALPHA 
1(X) CHAIN 
PRECURSOR 


0.024 
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Nearest Neighbor (BlastX vs. Non-Redundant Proteins) 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P VALUE 


ACCESSION 


DESCRIPTION 


P VALUE 


509 


U47322 


Cloning vector 
DNA, complete 
sequence. 


2.00E-38 


COA1_SV40 


COAT PROTEIN 
VP1 


6.2 


510 


AF031924 


Homo sapiens 
homeobox 

transcription 

factor barx2 


e-156 


CCMA_HAEIN 


HEME EXPORTER 
PROTEIN A 
(CYTOCHROME C- 
TYPE BIOGENESIS 
ATP-BINDING 
PROTEIN CCMA) 


3.5 


511 


AFO 10484 


Homo sapiens ICI 
YAC 9IA12, right 
end sequence 


3.00E-10 


<NONE> 


<NONE> 


<NONE> 


512 


Z63829 


H. sapiens CpG 
DNA, clone 90h2, 
forward read 
cpg90h2.ftla. 


5.00E-22 


NFIR_MESAU 


NUCLEAR FACTOR 
1 CLONE 
PNF1/RED1 (NF-I) 
(CCAAT-BOX 
BINDING 
TRANSCRIPTION 
FACTOR) (CTF) 
(TGGCA-BINDING 
PROTEIN) 


2.4 


513 


Z35094 


H.sapiens mRNA 
for SURF-2 


5.00E-97 


SUR2_HUMAN 


SURFEIT LOCUS 
PROTEIN 2 


1.00E-46 


514 


U95102 


Xenopus laevis 
mitotic 

phosphoprotein 
90 mRNA, 
complete cds 


7.00E-06 


<NONE> 


<NONE> 


<NONE> 


515 


D38417 


Mouse mRNA for 
arylhydrocarbon 
receptor, 
complete cds 


e-154 


TEGUEBV 


LARGE TEGUMENT 
PROTEIN 


3.4 


516 


L10911 


Homo sapiens 
splicing factor 
(CC 1.4) mRNA, 
complete cds. 


e-117 


<NONE> 


<NONE> 


<NONE> 


517 


X17093 


Human HLA-F 
gene for human 
leukocyte antigen 
F 


0.009 


YENl_SCHPO 


013695 

schizosaccharomyces 
pombe (fission yeast), 
hypothetical 52.9 kd 
serine-rich protein 
cllg7.01 in 
chromosome i. 1 1/98 


5.4 


518 


ABO 17026 


Mus musculus 
mRNA for 
oxysterol-binding 


0 


OXYB_HUMAN 


OXYSTEROL- 
BINDING PROTEIN 


1.00E-40 
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Nearest Neighbor (BlastX vs. Non-Redundant Proteins) 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P VALUE 


ACCESSION 


DESCRIPTION 


P VALUE 






protein, complete 
cds 










519 


X55038 


Mouse mCENP-B 
gene for 
centromere 
autoantigen B 


0.001 


YNW7_YEAST 


HYPOTHETICAL 
68.8 KD PROTEIN 
IN URE2-SSU72 
INTERGENIC 
REGION 


3.00E-04 


520 


ABO 18323 


Homo sapiens 
mRNA for 
KIAA0780 
protein, partial 
cds 


3.00E-41 


LBR_CHICK 


LAMIN B 
RECEPTOR 


2.3 


521 


U95094 


Xenopus laevis 
XL-INCENP 
(XL-INCENP) 
mRNA, complete 
cds 


L00E-1O 


CA25JHUMAN 


PROCOLLAGEN 
ALPHA 2(V) CHAIN 
PRECURSOR 


0.002 


522 


X03558 


Human mRNA 
for elongation 
factor 1 alpha 
subunit 


0 


EF1 1_HUMAN 


ELONGATION 
FACTOR 1 -ALPHA 1 
(EF-1 -ALPHA- 1) 


e-110 


523 


U95102 


Xenopus laevis 
* 

mitotic 

phosphoprotein 
90 mRNA, 
complete cds 


3.00E-11 


YMT8_YEAST 


HYPOTHETICAL 
36.4 KD PROTEIN 
INNUP116-FAR3 
INTERGENIC 
REGION 


8.00E-07 


524 


ABO 14591 


Homo sapiens 
mRNA for 
KIAA0691 
protein, complete 
cds 


0 


NOT2_YEAST 


GENERAL 
NEGATIVE 
REGULATOR OF 
TRANSCRIPTION 
SUBUNIT 2 


8.00E-05 


525 


AB019488 


Homo sapiens 
DNA for TRKA, 
exon 17 and 
complete cds 


0 


TRKA_HUMAN 


HIGH AFFINITY 
NERVE GROWTH 
FACTOR 
RECEPTOR 
PRECURSOR 

PRfYTFTXh rP1J.PL 
r t\\J I HrliN ) ^rlH-U- 

TRKA) 


2.00E-27 


526 


U95102 


Xenopus laevis 
mitotic 

phosphoprotein 
90 mRNA, 
complete cds 


5.00E-15 


CNG4_BOVIN 


240K PROTEIN OF 
ROD 

PHOTORECEPTOR 
CNG-CHANNEL 
CYCLIC- 
NUCLEOTIDE- 
GATED CATION 


0.018 
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Nearest Neighbor (BlastN vs. Genbank) 


Nearest Neighbor (BlastX vs. Non-Redundant Proteins) 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P VALUE 


ACCESSION 


DESCRIPTION 


P VALUE 












CHANNEL 4 (CNG 
CHANNEL 4) 
MODULATORY 

bUt>UiNli J) 




527 


U95094 


Xenopus laevis 
XL-INCENP 
(XL-INCENP) 
mRNA, complete 
cds 


2.00E-06 


HMZl_DROME 


ZERKNUELLT 
PROTEIN 1 (ZEN-1) 


0.88 


528 


J03750 


Mouse single 
stranded DNA 
binding protein p9 
mRNA, complete 
cds. 


e-135 


P15_HUMAN 


ACTIVATED RNA 
POLYMERASE II 
TRANSCRIPTIONA 
L COACTIVATOR 
PI 5 (PC4)(P14) 


3.00E-21 


529 


U95094 


Xenoous laevis 
XL-INCENP 
(XL-INCENP) 
mRNA, complete 
cds 


1.00E-12 


RS5 DROME 


40S RIBOSOMAL 
PROTEIN S5 


0.42 


530 


Z57610 


H.sapiens CpG 
DNA, clone 
187al0, reverse 
read 

cpgl87al0.rtla. 


8.00E-61 


HN3B MOUSE 


HEPATOCYTE 
NUCLEAR FACTOR 
3 -BETA (HNF-3B) 


4.00E-1 5 


531 


U95760 


Drosophila 
melanogaster 
strawberry notch 
(sno) mRNA, 
complete cds 


3.00E-60 


<NONE> 


<NONE> 


<NONE> 


532 


U95094 


Xenopus laevis 
XL-INCENP 
(XL-INCENP) 
mRNA, complete 
cds 


4.00E-11 


<NONE> 


<NONE> 


<NONE> 


533 


U50535 


Human BRCA2 
region, mRNA 
sequence CG006 


4.00E-12 


ALU1_HUMAN 


!!!! ALU 

SUBFAMILY J 

WARNING ENTRY 
nti 

• « m • 


1.1 


534 


X92841 


H.sapiens MICA 
gene 


1.00E-55 


LIN1_HUMAN 


LINE-l REVERSE 
TRANSCRIPTASE 
HOMOLOG 


6.00E-09 
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Nearest Neighbor (BlastN vs. Genbank) 


Nearest Neighbor (BlastX vs. Non-Redundant Proteins) 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P VALUE 


ACCESSION 


DESCRIPTION 


P VALUE 


535 


U60337 


Homo sapiens 
beta-mannosidase 
mRNA, complete 
cds 


0 


NODC_BRAEL 


N- 

ACETYLGLUCOSA 
MINYLTRANSFERA 
SE (EC 2.4.1.-) 


1.4 


536 


M21731 


Human lipocortin- 
V mRNA, 
complete cds. 


e-169 


ANX5_HUMAN 


ANNEXIN V 
(LIPOCORTIN V) 
(ENDONEXIN II) 
(CALPHOB INDIN I) 
(CBP-I) 
(PLACENTAL 
ANTICOAGULANT 
PROTEIN I) (PAP-I) 
ANTICOAGULANT- 
ALPHA) (VAC- 
ALPHA) 

(ANCHORIN CII) 


1 .00E-05 


537 


Y08013 


S.salar DNA 
segment 
containing GT 
repeat 


0.006 


<NONE> 


<NONE> 


<NONE> 


538 


<NONE> 


<NONE> 


<NONE> 


<NONE> 


<NONE> 


<NONE> 


539 


M98502 


Mus musculus 
protein encoding 
twelve zinc finger 
proteins (pMLZ- 
4) mRNA, 
complete cds. 


2.00E-17 


DYNA_CHICK 


DYNACTIN, 117KD 
ISOFORM 


7.4 


540 


U95102 


Xenopus laevis 
mitotic 

phosphoprotein 
90 mRNA, 
complete cds 


6.00E-05 


HXA3_HAEIN 


HEME:HEMOPEXIN 
-BINDING PROTEIN 
PRECURSOR 


2.6 


541 


U95094 


Xenopus laevis 
XL-INCENP 
(XL-INCENP) 
mRNA, complete 
cds 


1.00E-13 


AMOJCLEAE 


AMINE OXIDASE 
PRECURSOR (EC 
1.4.3.6) 

(MONAMINE 
OXIDASE) 
(TYRAMINE 
OXIDASE) 


1.5 


542 


AF083322 


Homo sapiens 
centriole 

associated protein 
CEPllOmRNA, 
complete cds 


e-133 


CA34_HUMAN 


PROCOLLAGEN 
ALPHA 3(IV) 
CHAIN 
PRECURSOR 


1.5 
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Nearest Neighbor (BlastN vs. Genbank) 


Nearest Neighbor (BlastX vs. Non-Redundant Proteins) 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P VALUE 


ACCESSION 


DESCRIPTION 


P VALUE 


543 


J03746 


Human 
glutathione S- 
transferase 
mRNA, complete 
cds. 


e-170 


GTMI_HUMAN 


GLUTATHIONE S- 
TRANSFERASE, 
MICROSOMAL (EC 
2.5.1.18) 


5.00E-39 


544 


U67522 


Methanococcus 
jannaschii section 

complete genome 


0.37 


A1AA_HUMAN 


ALPHA- 1 A 
ADKbNbKUlL 

I\JCV-/JZ»r 1 Utv 


A A 

4.3 


545 


U95102 


Xenopus laevis 
mitotic 

phosphoprotein 
90 mRNA, 
complete cds 


2.00E-07 


<NONE> 


<NONE> 


<NONE> 


546 


<NONE> 


<NONE> 


<NONE> 


<NONE> 


<NONE> 


<NONE> 


547 


<NONE> 


<NONE> 


<NONE> 


<NONE> 


<NONE> 


<NONE> 


548 


D87001 


Human (lambda) 
DNA for 
immunoglobulin 
light chain 


0.35 


VAL3_TYLCU 


AL3 PROTEIN (C3 
PROTEIN) 


3.2 


549 


U95094 


Xenopus laevis 
XL-INCENP 
(XL-INCENP) 
mRNA, complete 
cds 


3.00E-08 


TEGU_HSV11 


LARGE TEGUMENT 
PROTEIN (VIRION 
PROTEIN UL36) 


0.004 


550 


D16991 


Human HepG2 
partial cDNA, 
clone 

hmd2d01m5 


8.00E-09 


PTM1_YEAST 


PROTEIN PTM1 
PRECURSOR 


0.033 


551 


M34025 


Human fetal Ig 
heavy chain 
variable region 


3.2 


<NONE> 


<NONE> 


<NONE> 


552 


M98502 


Mus musculus 
protein encoding 
twelve zinc finger 
proteins (pMLZ- 
4) mRNA, 
complete cds. 


5.00E-14 


<NONE> 


<NONE> 


<NONE> 


553 


U95098 

1 


Xenopus laevis 
mitotic 

phosphoprotein 
44 mRNA, partial 
cds 


0.002 


<NONE> 


<NONE> 


<NONE> 
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Nearest Neighbor (BlastN vs. Genbank) 


Nearest Neighbor (BlastX vs. Non-Redundant Proteins) 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P VALUE 


ACCESSION 


DESCRIPTION 


P VALUE 


554 


Z78730 


H.sapiens flow- 
sorted 

chromosome 6 
Hindlll fragment, 
SC6pA15C3 


3.00E-20 


ALU1_HUMAN 


!!!! ALU 
SUBFAMILY J 
WARNING ENTRY 

III! 

B ■ • ■ 


5.00E-06 


555 


U74496 


Human 

chromosome 4q35 

subtelomeric 

sequence 


8.00E-08 


ICP4_VZVD 


TRANS-ACTING 
TRANSCRIPTIONA 
L PROTEIN ICP4 


0.39 


556 


U39875 


Rattus nor\'egicus 
EF-hand Ca2+- 
binding protein 
p22 mRNA, 
complete cds. 


2.00E-56 


YHFKJECOLI 


HYPOTHETICAL 
79.5 KD PROTEIN 
IN CRP-ARGD 
INTERGENIC 
REGION (0696) 


9.8 


557 


U65416 


Human MHC 
class I molecule 
(MICB) gene, 
complete cds 


0.12 


<NONE> 


<NONE> 


<NONE> 


558 


AG000037 


Homo sapiens 
genomic DNA, 
21q region, clone: 
9H1 1A22 


5.00E-25 


<NONE> 


<NONE> 


<NONE> 


559 


U95102 


Xenopus laevis 
mitotic 

phosphoprotein 
90 mRNA, 
complete cds 


5.00E-05 


<NONE> 


<NONE> 


<NONE> 


560 


AB007918 


Homo sapiens 
mRNA for 
KIAA0449 
protein, partial 
cds 


0.015 


VGLE_HSV11 


GLYCOPROTEIN E 
PRECURSOR 


2.2 


561 


U58884 


Mus musculus 
SH3-containing 
protein SH3P7 
mRNA, complete 
cds. similar to 
Human Drebrin 


1.00E-73 


YCV2_YEAST 


HYPOTHETICAL 
13.8 KD PROTEIN 
IN PWP2-SUP61 

INTERGENIC 
REGION 


2.6 


562 


AB007878 


Homo sapiens 
KIAA0418 
mRNA, complete 
cds 


e-110 


GLU2_MAIZE 


GLUTELIN 2 
PRECURSOR (ZEIN- 
GAMMA) (27 KD 
ZEIN) 


0.72 


563 


AF065482 


Homo sapiens 
sorting nexin 2 
(SNX2) mRNA, 
complete cds 


0 


YJD6_YEAST 


HYPOTHETICAL 
49.0 KD PROTEIN 
IN NSP1-KAR2 
INTERGENIC 


1.4 
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Nearest Neighbor (BlastN vs. Genbank) 


Nearest Neighbor (BlastX vs. Non-Redundant Proteins) 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P VALUE 


ACCESSION 


DESCRIPTION 


P VALUE 












REGION 




564 


U27873 


Stealth virus 1 
clone 3B11T7 


0.002 


SYN1_HUMAN 


SYNAPSINS IA 
AND IB (BRAIN 
PROTEIN 4.1) 


1.6 


565 


L3895 1 


Homo sapiens 
importin beta 
subunit mRNA, 
complete cds 


2.00E-68 


VP2_BRD 


STRUCTURAL 
CORE PROTEIN 
VP2 


1.1 


566 


AF007155 


Homo sapiens ' 
clone 23763 
unknown mRNA, 
partial cds 


e-165 


YOHI_AZOVI 


HYPOTHETICAL 
33.2 KD PROTEIN 
IN IBPB 5'REGION 


7.5 


567 


Z56295 


Rsapiens CpG 
DNA, clone 10c2, 
forward read 
cpgl0c2.ftla. 


0.12 


A1AB_CANFA 


ALPHA- IB 
ADRENERGIC 
RECEPTOR 
(FRAGMENT) 


0.85 


568 


Z83792 


G.gallus 
microsatellite 
DNA (LEI0222 


0.12 


<NONE> 


<NONE> 


<NONE> 


569 


U11820 


Feline 

immunodeficienc 
y virus 

USIL2489_7B 
gag polyprotein 
(gag) gene, 
complete cds, 
polymerase 
polyprotein (pol) 
gene, partial cds, 
vif protein (vif), 
complete cds, and 
envelope 
glycoprotein 
(env), complete 
cds, complete g... 


1.1 


<NONE> 


<NONE> 


<NONE> 


570 


Ml 8065 


Mouse 18S and 
28S ribosomal 

DNA, 5' 
hypervariable 
(Vr) region, clone 
Ml. 


6.00E-04 


CC40_YEAST 


CELL DIVISION 
CONTROL 
PROTEIN 40 


3.7 
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Nearest Neighbor (BlastN vs. Genbank) 


Nearest Neighbor (BlastX vs. Non-Redundant Proteins) 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P VALUE 


ACCESSION 


DESCRIPTION 


P VALUE 


571 


AF053645 


Homo sapiens 
cellular apoptosis 
susceptibility 
protein (CSE1) 
gene, exons 3 
through 10 


2.00E-07 


YMQ4_CAEEL 


HYPOTHETICAL 
25.8 KD PROTEIN 
K02D10.4 IN 
CHROMOSOME III 


4.3 


572 




ri u man z.j kd 
mRNA for 
cytoskeletal 
tropomyosin 

I IVlJUylllll ) 


U 


<NUNc> 


<NONE> 


<NONE> 


573 


AC001159 


Homo sapiens 
(subclone l_h9 
irom rAL Hyz) 
DNA sequence 


5.00E-04 


XYND_CELFI 


ENDO- 1 ,4-BETA- 
XYLANASE D 
PRECURSOR (EC 
3.2.1.8) 


7.3 


574 


Z60625 


Ksapiens CpG 
DNA, clone 2c 1 0, 
forward read 
cpg2cl0.ftlaa . 


4.00E-13 


<NONE> 


<NONE> 


<NONE> 


575 


AF070640 


Homo sapiens 
clone 24781 
mRNA sequence 


e-164 


<NONF> 


<T\TOr\JF> 




576 


Y11306 


Homo sapiens 

mRNA for hTCF- 
4 


2.00E-48 


TCF1_HUMAN 


T-CELL-SPECIF1C 
TRANSCRIPTION 


2.00E-15 


577 


X65279 


pWE15 cosmid 
vector DNA 


7.00E-69 


OCLN_POTTR 


Q28793 potorous 
tridactylus (potoroo). 
occludin. 1 1/98 


0.71 


578 


Miuzyo 


Mouse uina witn 
homology to EB V 
IR3 repeat, 
segment 1, clone 
iviuz. 


A HA 1 

0.001 


LMB 1_HYDAT 


T A A jCT~K TTTk. T T% try A t 

LAMININ BETA-1 
CHAIN 
PRECURSOR 
(FRAGMENTS) 


1.9 


579 


X53744 


Canine mRNA for 
68kDA subunit of 
signal recognition 
particle (SRP68) 


e-162 


SR68_CANFA 


SIGNAL 
RECOGNITION 
PARTICLE 68 KD 
PROTEIN (SRP68) 


5.00E-16 


580 


AF086438 


Homo sapiens full 
length insert 
cDNA clone 
ZD80G11 


2.00E-04 


<NONE> 


<NONE> 


<NONE> 


581 


U15140 


Mycobacterium 
bovis ribosomal 
proteins IF-1 
complete cds, and 


1.3 


<NONE> 


<NONE> 


<NONE> 
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Nearest Neighbor (BlastN vs. Genbank) 


Nearest Neighbor (BlastX vs. Non-Redundant Proteins) 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P VALUE 


ACCESSION 


DESCRIPTION 


P VALUE 






S4 (rpsD) gene, 
partial cds 










582 


D13292 


Human mRNA 
for ryudocan core 
protein 


e-166 


RSP4_ARATH 


40S RIBOSOMAL 
PROTEIN SA (P40) 
(LAMININ 
RECEPTOR 
HOMOLOG) 


1.4 


583 


S71022 


neoplasm-related 
CI 40 product 
[human, thyroid 
carcinoma cells, 
mRNA, 670 nt] 


9.00E-30 


RL6_HUMAN 


60S RIBOSOMAL 
PROTEIN L6 (TAX- 
RESPONSIVE 
ENHANCER 
ELEMENT BINDING 
PROTEIN 107) 
(TAXREB107) 


5.6 


584 


L20934 


Anopheles 
gambiae complete 
mitochondrial 
genome 


0.014 


<NONE> 


<NONE> 


<NONE> 


585 


Z49269 


H.sapiens gene 
for chemokine 
HCC-1. 


1.1 


AMY1_DICTH 


ALPHA-AMYLASE 
1 (EC 3.2.1.1) (1,4- 
ALPHA-D-GLUCAN 
GLUCANOHYDROL 
ASE) 


2.5 


586 


U95098 


Xenopus laevis 
mitotic 

phosphoprotein 
44 mRNA, partial 
cds 


2.00E-04 


<NONE> 


<NONE> 


<NONE> 


587 


AF029893 


Homo sapiens i- 

beta-l,3-N- 

acetylglucosamin 

yltransferase 

mRNA, complete 

cds 


0.13 


HEMOJPIG 


HEMOPEXIN 
PRECURSOR 
(HYALURONIDASE 
) (EC 3.2.1.35) 


3.5 


588 


J05109 


T.thermophila 
calcium-binding 
25 kDa (TCBP 
25) protein gene, 
complete cds. 


0.014 


<NONE> 


<NONE> 


<NONE> 


589 


U95098 


Xenopus laevis 
mitotic 

phosphoprotein 
44 mRNA, partial 
cds 


6.00E-04 


<NONE> 


<NONE> 


<NONE> 
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Nearest Neighbor (BlastN vs. Genbank) 


Nearest Neighbor (BlastX vs. Non-Redundant Proteins) 


SEQ 
ID 




nCCTD TDTTAXT 

DbaCKlr 1IUN 


P VALUE 


ACCESSION 

r 


DESCRIPTION 


P VALUE 


590 


AF060246 


Mus musculus 
strain C57BL/6 
zinc finger protein 
106 (Zfpl06) 
mRNA, H3a-a 

allele, complete 
cds 


1.00E-83 


SCRB_PEDPE 


SUCROSE-6- 
PHOSPHATE 
HYDROLASE (EC 
3.2.1.26) (SUCRASE) 


10 


591 


Y11966 


B.aphidicola (host 
T.suberi) plasmid 
pBTsl genes 
leuA, hspA, 
repA2, repAl, 
leuB, leuC, leuD, 
leuA 


0.37 


<NONE> 


<NONE> 


<NONE> 


592 


U20428 


Human SNC 19 
mRNA sequence 


1 .00E-64 


YY22_MYCTU 


HYPOTHETICAL 
30.8 KD PROTEIN 
CY49.22 


0.29 


593 


AF043084 


Lycopersicon 
esculentum 
ethylene receptor 
homolog (ETR1) 
mRNA, complete 
cds 




0.37 


KNIRJDROME 


ZYGOTIC GAP 
PROTEIN KNIRPS 


9.9 


594 


X65279 


pWE15 cosmid 
vector DNA 


5.00E-66 


COA1SV40 


COAT PROTEIN 
VP1 


0.001 


595 


U95098 


Xenopus laevis 
mitotic 

phosphoprotein 
44 mRNA, partial 
cds 


0.041 


UL88_HSV7J 


PROTEIN U59 


5.8 


596 


M91452 


Sus scrofa 
ryanodine 
receptor (RYR1) 
gene, complete 

cds. 


3.2 


<NONE> 


<NONE> 


<NONE> 


597 


U77327 


Human Ki-1/57 
intracellular 
antigen mRNA, 
partial cds 


e-158 


GAT1_CHICK 


ERYTHROID 
TRANSCRIPTION 
FACTOR (GATA-1) 
(ERYF1) 


1.2 


598 


U77327 


Human Ki-1/57 
intracellular 
antigen mRNA, 
partial cds 


0 


RPB7_ARATH 


DNA-DIRECTED 
RNA POLYMERASE 
II 19 KD 

POLYPEPTIDE (EC 
2.7.7.6) (RNA 
POLYMERASE II 
SUBUNIT 5) 


6.2 
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Nearest Neighbor (BlastN vs. Genbank) 


Nearest Neighbor (BlastX vs. Non-Redundant Proteins) 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P VALUE 


ACCESSION 


DESCRIPTION 


P VALUE 


599 


Y 16964 


Saccharomyces 
sp. mitochondrial 
DNA for OLI1 
gene, strain CID1 


0.37 


NMD5_YEAST 


NONSENSE- 
MEDIATED MRNA 
DECAY PROTEIN 5 


1.9 


600 


U95102 


Xenopus laevis 
mitotic 

phosphoprotein 
90 mRNA, 
complete cds 


6.00E-06 


<NONE> 


<NONE> 


<NONE> 


601 


U95098 


Xenopus laevis 
mitotic 

phosphoprotein 
44 mRNA, partial 
cds 


8.00E-08 


<NONE> 


<NONE> 


<NONE> 


602 


AF091046 


Brugia pahangi 
nuclear hormone 
receptor (bhr-1) 
gene, partial cds 


1.1 


INVO_PONPY 


INVOLUCRIN 


0.23 


603 


M87339 


Human 

replication factor 
C, 37-kDa subunit 
mRNA, complete 
cds 


0 


AC12_HUMAN 


ACTIVATOR 1 37 
KD SUBUNIT 
(REPLICATION 
FACTOR C 37 KD 
SUBUNIT) (A 1 37 
KD SUBUNIT) (RF- 
C 37 KD SUBUNIT) 
(RFC37) 


1.00E-38 


604 


D28116 


Human genes for 
collagen type IV 
alpha 5 and 6, 
exon 1 and exon 
V 


0.39 


<NONE> 


<NONE> 


<NONE> 


605 


U95102 


Xenopus laevis 
mitotic 

phosphoprotein 
90 mRNA, 
complete cds 


2.00E-06 

- 


<NONE> 


<NONE> 


<NONE> 


606 


AE001149 


Borrelia 
burgdorferi 
(section 35 of 70) 
of the complete 
genome 


0.13 


<NONE> 


<NONE> 


<NONE> 


607 


X14168 


Human pLC46 
with DNA 
replication origin 


6.00E-16 


Z136_HUMAN 


ZINC FINGER 
PROTEIN 136 


0.31 



287 



Docket No. 1480P 

Table 2 



: :.*.:;:>%t;-;;:*: : : f 

-X'. 


Nearest Neighbor (BlastN vs. Genbank) 


Nearest Neighbor (BlastX vs. Non-Redundant Proteins) 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P VALUE 


ACCESSION 


DESCRIPTION 


P VALUE 


608 


Z57610 


H. sapiens CpG 
DNA, clone 
187a 10, reverse 
read 

cpgl87al0.rtla. 


7.00E-90 


HN3B_RAT 


HEPATOCYTE 
NUCLEAR FACTOR 
3-BETA (HNF-3B) 


1.00E-19 


609 


U95098 


Xenopus laevis 
mitotic 

phosphoprotein 
44 mRNA, partial 
cds 


0.043 


PGCV_MOUSE 


VERSICAN CORE 

PROTEIN 

PRECURSOR 

(LARGE 

FIBROBLAST 

PROTEOGLYCAN) 

(CHONDROITIN 

SULFATE 

PROTEOGLYCAN 

CORE PROTEIN 2) 

(PG-M) 


3.5 


610 


U95094 


Xenopus laevis 
XL-INCENP 
(XL-INCENP) 
mRNA, complete 
cds 


7.00E-07 


CA11_CHICK 


PROCOLLAGEN 
ALPHA 1(1) CHAIN 
PRECURSOR 


0.4 


611 


AB007956 


Homo sapiens 
mRNA, 
chromosome 1 
specific transcript 
KIAA0487 


e-106 


RRPB_CVMA5 


RNA-DIRECTED 
RNA POLYMERASE 
(EC 2.7.7.48) 
(ORF1B) 


9.7 


612 


U95102 


Xenopus laevis 
mitotic 

phosphoprotein 
90 mRNA, 
complete cds 


0.005 


<NONE> 


<NONE> 


<NONE> 


613 


U95094 


Xenopus laevis 
XL-INCENP 
(XL-INCENP) 
mRNA, complete 
cds 


6.00E-05 


UL52_EBV 


HELICASE/PRIMAS 
E COMPLEX 
PROTEIN 
(PROBABLE DNA 
REPLICATION 
PROTEIN BSLF1) 


5.9 
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Nearest Neighbor (BlastN vs. Genbank) 


Nearest Neighbor (BlastX vs. Non-Redundant Proteins) 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P VALUE 


ACCESSION 


DESCRIPTION 


P VALUE 


614 


U95760 


Drosophila 
melanogaster 
strawberry notch 
(sno) mRNA, 
complete cds 


3.00E-71 


POLG_PVYHU 


GENOME 
POLYPROTEIN 
(CONTAINS: N- 
TERMINAL 
PROTEIN; HELPER 
COMPONENT 
PROTEINASE (EC 
3.4.22.-) (HC-PRO); 
42-50 KD PROTEIN; 
CYTOPLASMIC 
INCLUSION 
PROTEIN (CI); 6 KD 
PROTEIN; 
NUCLEAR 
INCLUSION 
PROTEIN A (NI- A) 
(EC 3.4.22.-) (49K 
PROTEINASE) (49 


4.3 


615 


U95102 


Xenopus laevis 
mitotic 

phosphoprotein 
90 mRNA, 
complete cds 


9.00E-09 


VP3_ROTPC 


INNER CORE 
PROTEIN VP3 


7.7 


616 


J05499 


Rattus norvegicus 
L-glutamine 
amidohydrolase 
mRNA, complete 
cds 


e-143 


GLSLJRAT 


GLUTAMINASE, 
LIVER ISOFORM 
PRECURSOR (EC 
3.5.1.2)(GLS) 


7.00E-67 


617 


Ml 9262 


Rat clathrin light 
chain (LCB3) 
mRNA, complete 
cds. 


0.37 


Y642_METJA 


HYPOTHETICAL 
PROTEIN MJ0642 


5.8 


618 


M21191 


Human aldolase 
pseudogene 
mRNA, complete 
cds. 


1.00E-32 


LINl_NYCCO 


LINE-1 REVERSE 
TRANSCRIPTASE 
HOMOLOG 


6.00E-17 


619 


U95094 


Xenopus laevis 
XL-INCENP 
(XL-INCENP) 
mRNA, complete 
cds 


1.00E-11 


NUCM_BOVIN 


NADH- 
UBIQUINONE 
OXIDOREDUCTASE 
49 KD SUBUNIT (EC 
1.6.5.3) (EC 1.6.99.3) 
(COMPLEX I-49KD) 
(CI-49KD) 


0.044 


620 


U95098 


Xenopus laevis 
mitotic 

phosphoprotein 


0.005 


HEMZ_RHOCA 


FERROCHELATASE 

(EC 4.99.1.1) 
(PROTOHEME 


4.4 
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Nearest Neighbor (BlastN vs. Genbank) 


Nearest Neighbor (BlastX vs. Non-Redundant Proteins) 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P VALUE 


ACCESSION 


DESCRIPTION 


P VALUE 






44 mRNA, partial 
cds 






FERRO-LYASE) 




621 


AF041428 


Homo sapiens 
ribosomal protein 
s4 X isoform 
gene, complete 
cds 


0.002 


<NONE> 


<NONE> 


<NONE> 


622 


X07158 


Chironomus 
thummi DNA for 
Cla repetitive 
element 


0.13 


<NONE> 


<NONE> 


<NONE> 


623 


U95094 


Xenopus laevis 
XL-INCENP 
(XL-INCENP) 
mRNA, complete 
cds 


8.00E-04 


<NONE> 


<NONE> 


<NONE> 


624 


AF 100470 


Rattus norvegicus 
ribosome attached 
membrane protein 
4 (RAMP4) 
mRNA, complete 
cds 


1.00E-53 


<NONE> 


<NONE> 


<NONE> 


625 


U85193 


Human nuclear 
factor I-B2 
(NFIB2) mRNA, 
complete cds 


2.00E-38 


<NONE> 


<NONE> 


<NONE> 


626 


Ml 3452 


Human lamin A 
mRNA, 3'end. 


6.00E-16 


<NONE> 


<NONE> 


<NONE> 


627 


U95094 


Xenopus laevis 
XL-INCENP 
(XL-INCENP) 
mRNA, complete 
cds 


0.014 


ACDVRAT 


ACYL-COA 
DEHYDROGENASE, 
VERY-LONG- 
CHAIN SPECIFIC 
PRECURSOR (EC 
1.3.99.-) (VLCAD) 


4.00E-20 


628 


U95094 


Xenopus laevis 
XL-INCENP 
(XL-INCENP) 
mRNA, complete 
cds 


3.00E-10 


<NONE> 


<NONE> 


<NONE> 


629 


<NONE> 


<NONE> 


<NONE> 


<NONE> 


<NONE> 


<NONE> 


630 


U95102 


Xenopus laevis 
mitotic 

phosphoprotein 
90 mRNA, 
complete cds 


2.00E-05 


<NONE> 


<NONE> 


<NONE> 
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Nearest Neighbor (BlastN vs. Genbanlc) 


Nearest Neighbor (BlastX vs. Non-Redundant Proteins) 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P VALUE 


ACCESSION 


DESCRIPTION 


P VALUE 


631 


U95I02 


Xenopus laevis 
mitotic 

phosphoprotein 
90 mRNA, 
complete cds 


6.00E-05 


<NONE> 


<NONE> 


<NONE> 


632 


U95094 


Xenopus laevis 
XL-INCENP 
(XL-INCENP) 
mRNA, complete 
cds 


6.00E-05 


YS83_CAEEL 


HYPOTHETICAL 
86.9 KD PROTEIN 
ZK945.3 IN 
CHROMOSOME II 


0.65 


633 


U95102 


Xenopus laevis 
mitotic 

phosphoprotein 
90 mRNA, 
complete cds 


3.00E-09 


NRP_MOUSE 


NEUROPILIN 
PRECURSOR (A5 
PROTEIN) 


2.7 


634 


U95098 


Xenopus laevis 
mitotic 

phosphoprotein 
44 mRNA, partial 
cds 


2.00E-05 


Y4JN_RHISN 


HYPOTHETICAL 
16.3 KD PROTEIN 
Y4JN 


5.9 


635 


U95102 


Xenopus laevis 
mitotic 

phosphoprotein 
90 mRNA, 
complete cds 


6.00E-05 


<NONE> 


<NONE> 


<NONE> 


636 


X64707 


H.sapiens BBC1 
mRNA 


e-179 


RL13_HUMAN 


60S RIBOSOMAL 
PROTEIN L 13 
(BREAST BASIC 
CONSERVED 
PROTEIN 1) 


5.00E-40 


oi / 


U95102 


Xenopus laevis 
mitotic 

phosphoprotein 
90 mRNA, 
complete cds 


3.00E-08 


<NONE> 


<NONE> 


<NONE> 


638 


X14168 


Human pLC46 
with DNA 
rpnliration orioin 


5.00E-14 


SP3_HUMAN 


TRANSCRIPTION 
FACTOR SP3 (SPR- 


0.19 


639 


X90999 


H.sapiens mRNA 
for Glyoxalase II 


9.00E-20 


GL02_HUMAN 


HYDROXY ACYLGL 
UTATHIONE 
HYDROLASE (EC 
3.1.2.6) 


0.007 


640 


AF083322 


Homo sapiens 
centriole 

associated protein 
CEPllOmRNA, 


9.00E-51 


KIF4_MOUSE 


KINESIN-LIKE 
PROTEIN KIF4 


0.005 
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Nearest Neighbor (BlastN vs. Genbank) 


Nearest Neighbor (BlastX vs. Non-Redundant Proteins) 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P VALUE 


ACCESSION 


DESCRIPTION 


P VALUE 






complete cds 










641 


Z 12002 


M.musculus Pvt-1 
mRNA. 


0.36 


CP5F_CANTR 


CYTOCHROME 
P450 LIIA6 
(ALKANE- 
INDUCIBLE) (EC 
1.14.14.1) (P450- 
ALK3) 


5.6 


642 


Ml 0206 


R.sphaeroides 
reaction center L 
subunit (complete 
cds) and M 
subunit (5' end) 
genes. 


1.1 


YGR1_YEAST 


HYPOTHETICAL 
34.8 KD PROTEIN 
IN SUT1-RCK1 
INTERGENIC 
REGION 


0.006 


643 


K02668 


E. coli ddl gene 
encoding D- 

alanine:D-alanine 
ligase and ftsQ 
and ftsA genes, 
complete cds, and 

ftsZ gene, 5' end. 


3.3 


ANKB_HUMAN 


ANKYRIN, BRAIN 
VARIANT 1 
(ANKYRIN B) 
(ANKYRIN, 
NONERYTHROID) 


7.00E-07 


f^AA 
044 


<NONE> 


<NONE> 


<NONE> 


<NONE> 


<NONE> 


<NONE> 


645 


X53616 


C.domesticus 
calnexin (pp90) 
mRNA 


1.1 


<NONE> 


<NONE> 


<NONE> 


646 


X57010 


Human COL2A1 
gene for collagen 
II alpha 1 chain, 
exons E2-E15 


3.3 


PRIO_PIG 


MAJOR PRION 
PROTEIN 

PRECURSOR (PRP) 


1.9 


647 


U95097 


Xenopus laevis 
mitotic 

phosphoprotein 
43 mRNA, partial 
cds 


1.1 


UL07_HSV2H 


PROTEIN UL7 


7.3 


648 


X52956 


Human CAMII- 
jsi3 calmodulin 
retropseudogene 


0.37 


PRTP_EBV 


PROBABLE 
PROCESSING AND 
TRANSPORT 
PROTEIN 


7.5 


649 


M93425 


Human protein 

tyrosine 

phosphatase 

(PTP-PEST) 

mRNA, complete 

cds. 


0 


PTNC_HUMAN 


PROTEIN- 
TYROSINE 
PHOSPHATASE Gl 
(EC 3.1.3.48) 
(PTPG1) 


e-107 
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Nearest Neighbor (BlastN vs. Genbank) 


Nearest Neighbor (BlastX vs. Non-Redundant Proteins) 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P VALUE 


ACCESSION 


DESCRIPTION 


P VALUE 


650 


L47615 


Mus musculus 
DNA-binding 
protein (Fli-1) 
gene, 5* end of 
cds. 


0.13 


YA53_SCHPO 


HYPOTHETICAL 
24.2 KD PROTEIN 
C13A11.03IN 
CHROMOSOME I 


2.00E-07 


651 


U60337 


Homo sapiens 
beta-mannosidase 
mKfsA, complete 
cds 


0 


GIL1_ENTHI 


GALACTOSE- 
INHIBITABLE 

LJbC 1 IN 1 1\) K.D 

SUBUNIT 


0.22 


652 


U08813 


Oryctolagus 

cuniculus 

Na+/glucose 

cotransporter- 

related protein 

mRNA, complete 

cds. 


1.00E-22 


NAG1_HUMAN 


SODIUM/GLUCOSE 

COTRANSPORTER 

1 (NA(+)/GLUCOSE 

COTRANSPORTER 

1) (HIGH AFFINITY 

SODIUM-GLUCOSE 

COTRANSPORTER) 


0.1 


653 


Y00282 

Am mm mm 


Human mRNA 
for ribophorin II 


2.00E-78 


RIB2 HUMAN 

m. M. A mm m, m 1 ' Al -% A. * 


DOLICHYL- 
DIPHOSPHOOLIGO 
SACCHARIDE— 
PROTEIN 

GLYCOSYLTRANS 
FERASE 63 KD 
SUBUNIT 
PRECURSOR (EC 
2.4.1.119) 
(RIBOPHORIN II) 


5.00E-19 


654 


D10051 


Human gene for 
92-kDa type IV 
collagenase, 5- 
flanking region 


0.014 


TAGB_DICDI 


PRESTALK- 
SPECIFIC PROTEIN 
TAGB PRECURSOR 
(EC 3.4.21.-) 


7.6 


655 


M29930 


Human insulin 
receptor (allele 2) 
gene, exons 14, 
15, 16 and 17. 


8.00E-08 


<NONE> 


<NONE> 


<NONE> 


656 


U78310 


Homo sapiens 
pescadillo 
mRNA, complete 
cds 


0 


YG2S_YEAST 


HYPOTHETICAL 
69.9 KD PROTEIN 
IN MIC1-SRB5 
INTERGENIC 
REGION 


0.002 


657 


X68792 


S.coelicolor 
A3(2) promoter 
sequence pth270 


3.2 


YBS0_YEAST 


HYPOTHETICAL 
27.0 KD PROTEIN 
IN VAL1-HSP26 
INTERGENIC 
REGION 


0.073 
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Nearest Neighbor (BlastN vs. Genbank) 


Nearest Neighbor (BlastX vs. Non-Redundant Proteins) 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P VALUE 


ACCESSION 


DESCRIPTION 


P VALUE 


658 


U50535 


Human BRCA2 
region, mRNA 
sequence CG006 


4.00E-12 


ALU1_HUMAN 


!!!! ALU 
SUBFAMILY J 
WARNING ENTRY 

Hit 

■ • • • 


1.2 


659 


U 15522 


Sus scrofa clone 
pvglalg heavy 
chain variable 
VDJ region 
mRNA, partial 
cds. 


3.2 


Z165JHUMAN 


ZINC FINGER 
PROTEIN 165 


3.2 


660 


M20918 


C.thummi piger 
haemoglobin (Hb) 
gene DNA, 
complete cds. 


0.12 


YT25_CAEEL 


HYPOTHETICAL 
59.9 KD PROTEIN 
B0304.5 IN 
CHROMOSOME II 


0.033 


661 


U60337 


Homo sapiens 
beta-mannosidase 

cds 


0 


<NONE> 


<NONE> 


<NONE> 


662 


U95098 


Xenopus laevis 
mitotic 

phosphoprotein 
44 mRNA, partial 
cds 


0.001 


ENV_MLVFP 


ENV POLYPROTEIN 
PRECURSOR 
(CONTAINS: KNOB 
PROTEIN GP70; 
SPIKE PROTEIN 
P15E;R PROTEIN) 


3.3 


663 


M97287 


Human 

MAR/S AR DNA 
binding protein 
(SATB1) mRNA, 
complete cds. > :: 
gb|I58691|I58691 
Sequence 1 from 
patent US 
5652340 


0 


SAT1_HUMAN 


DNA-BINDING 
PROTEIN SATB1 
(SPECIAL AT-RICH 
SEQUENCE 
BINDING PROTEIN 

1) 


2.00E-20 


664 


L42612 


Homo sapiens 
keratin 6 isoform 
K6f (KRT6F) 
mRNA, complete 
cds 


e-168 


K2C4_BOVIN 


KERATIN, TYPE II 
CYTOSKELETAL 59 
KD, COMPONENT 
IV 


4.00E-10 


665 


U17901 


Rattus norvegicus 
phospholipase A- 
2-activating 
protein (plap) 
mRNA, complete 
cds. 


e-152 


PLAP_MOUSE 


PHOSPHOLIPASE 
A-2-ACTIVATING 
PROTEIN (PLAP) 


4.00E-13 
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Nearest Neighbor (BlastX vs. Non-Redundant Proteins) 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P VALUE 


ACCESSION 


DESCRIPTION 

A^ JU/k/ ^*A A A A X_^ A 1 


P VALUE 


666 


M73047 


Homo sapiens 
tripeptidyl 
oeotidase II 
mRNA, complete 
cds. 


0 


MERT_STRLI 


MERCURIC 

TRANSPORT 

PROTEIN 

A. X X* X*^ A " > 

(MERCURY ION 

TRANSPORT 

PROTEIN) 


4.4 


667 


U09954 


Human ribosomal 
protein L9 gene, 
5' region and 
complete cds. 


0 


RL9_HUMAN 


60S RIBOSOMAL 
PROTEIN L9 


2.00E-11 


668 


X98330 


H.sapiens mRNA 
for ryanodine 
receptor 2 


1.1 


HS74_MOUSE 


HEAT SHOCK 70 
KD PROTEIN AGP-2 


0.034 


669 


U95094 


Xenopus laevis 
XL-INCENP 
(XL-INCENP) 
mRNA, complete 
cds 


0.002 


RPC2_DROME 


DNA-DIRECTED 
RNA POLYMERASE 

III 128 KD 
POLYPEPTIDE 


1.1 


O/U 


AF069250 


Homo sapiens 

okadaic acid- 

inducible 

r>h o^nbnnrnfpin 

(OA48-18) 
mRNA, complete 

cds 


7.00E-80 


LEGB_PEA 


LEGUMIN B 
(FRAGMENT) 


0.011 


671 


Z71419 


S.cerevisiae 
chromosome XIV 
reading frame 
ORFYNL143C 


1.1 


FOCD_ECOLI 


OUTER 
MEMBRANE 

FOCD PRECURSOR 


9.7 


672 


AF044965 


Homo sapiens 
polio virus related 
protein 2 gene, 
alpha isoform, 
exon 6 and partial 
cds 


e-167 


PVR_MOUSE 


POLIOVIRUS 
RECEPTOR 
HOMOLOG 
PRECURSOR 


1.00E-12 


673 


add j i y 


Cloning vector 
pCAT-Enhancer 


2.00b-80 


O 1 r\^" T IT TA A A XT 

S106_HUMAN 


/*"» A T /T\ r /~it TXT 

CALCYCLIN 
(PROLACTIN 
RECEPTOR 
ASSOCIATED 
PROTEIN) 
CALCIUM- 
BINDING PROTEIN 
A6) 


3.00E-15 
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Nearest Neighbor (BlastN vs. Genbank) 


Nearest Neighbor (BlastX vs. Non-Redundant Proteins) 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P VALUE 


ACCESSION 


DESCRIPTION 


P VALUE 


674 


D29655 


Pig mRNA for 
UMP-CMP 
kinase, complete 
cds 


e-103 


V319_ASFB7 


J3 19 PROTEIN 


4.3 


675 


U95094 


Xenopus laevis 
XL-INCENP 
(XL-INCENP) 
mRNA, complete 
cds 


8.00E-08 


VEGR_RAT 


VASCULAR 
ENDOTHELIAL 
GROWTH FACTOR 
RECEPTOR 1 
PRECURSOR 
RECEPTOR FLT) 
(FLT-1) 


3.3 


676 


D90217 


S. cerevisiae gene 
for YmL33, 
mitochondrial 
ribosomal 
proteins of large 
subunit 


2.00E-07 


MALY_ECOLI 


MALY PROTEIN 
(EC 2.6.1.-) 


5.6 


677 


AF038952 


Homo sapiens 
cofactor A protein 
mRNA, complete 
cds 


e-160 


TlCA_MOUSE 


TCP1-CHAPERONIN 
COFACTOR A 


4.00E-19 


678 


Z96950 


Gorilla gorilla 
DNA sequence 
orthologous to the 
human Xp:Yp 
telomere-j unction 
region 


5.00E-14 


YHBZ_ECOLI 


HYPOTHETICAL 
43.3 KD GTP- 
BINDING PROTEIN 
IN DACB-RPMA 
INTERGENIC 
REGION (F390) 


3.3 


679 


D50418 


Mouse mRNA for 
AREC3, partial 
cds 


2.00E-79 


CYGX_RAT 


OLFACTORY 
GUANYLYL 
CYCLASE GC-D 
PRECURSOR (EC 
4.6.1.2) 


1.1 


680 


U95098 


Xenopus laevis 
mitotic 

phosphoprotein 
44 mRNA, partial 
cds 


8.00E-08 


P2C2_SCHPO 


PROTEIN 

PHOSPHATASE 2C 
HOMOLOG 2 (EC 
3.1.3.16) 


1.00E-04 


681 


AL010280 


Plasmodium 

falciparum DNA 
*** 

SEQUENCING 
IN PROGRESS 
*** from contig 
4-106, complete 
sequence 


0.12 


<NONE> 


<NONE> 


<NONE> 
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Nearest Neighbor (BlastN vs. Genbank) 


Nearest Neighbor (BlastX vs. Non-Redundant Proteins) 


SEQ 
ID 


ACCESSION 


DESCKirllON 


DUAT T It? 

r VALub 


ACCESSION 


UJbaCKlr I lUiN 


r VALUb 


682 


U95094 


Xenopus laevis 
XL-INCENP 
(XL-INCENP) 
mKNA, complete 
cds 


5.00E-04 


VSM2_TRYBB 


VARIANT 
SURFACE 
GLYCOPROTEIN 

Ml 1 A 1 I. J. 

PRECURSOR (VSG 
221) 


4.3 


683 


U00238 


Homo sapiens 
glutamine PRPP 
ami dotransferase 
(GPAT) mRNA, 
complete cds 


0 


<NONE> 


<NONE> 


<NONE> 


684 


U95102 


Xenopus laevis 
mitotic 

phosphoprotein 
90 mRNA, 
complete cds 


0.005 


PRPR_SALTY 


PROPIONATE 

CATABOLISM 

OPERON 

REGULATORY 

PROTEIN 


1.5 


685 


U95102 


Xenopus laevis 
mitotic 

phosphoprotein 
90 mRNA, 
complete cds 


7.00E-07 

« 


YAND_SCHPO 


HYPOTHETICAL 
30.4 KD PROTEIN 
C3H1.13IN 
CHROMOSOME I 


0.38 


686 


D25538 


Human mRNA 
forKIAA0037 
gene, complete 
cds 


0 


<NONE> 


<NONE> 


<NONE> 


687 


U95102 


Xenopus laevis 
mitotic 

phosphoprotein 
90 mRNA, 
complete cds 


2.00E-07 


A1AA_RAT 


ALPHA- 1 A 
ADRENERGIC 
RECEPTOR (RA42) 


4.4 


688 


L26956 


Mesocricetus 
auratus stearyl- 
CoA desaturase 
sequence 
including male 
hormone 
dependent gene 
derived from 
hamster 
frankorgan 


4.00E-33 


<NONE> 


<NONE> 


<NONE> 


689 


U95102 


Xenopus laevis 
mitotic 

phosphoprotein 
90 mRNA, 
complete cds 


3.00E-10 


<NONE> 


<NONE> 


<NONE> 
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Nearest Neighbor (BlastN vs. Genbank) 


Nearest Neighbor (BlastX vs. Non-Redundant Proteins) 


ID 


ACCESSION 


DESCRIPTION 


P VALUE 


ACCESSION 


DESCRIPTION 


P VALUE 


690 


U95102 


Xenopus laevis 
mitotic 

phosphoprotein 
90 mRNA, 
complete cds 


3.00E-09 


Y093_CAEEL 


HYPOTHETICAL 
58.5 KD PROTEIN 
T20B12.3 IN 
CHROMOSOME III 


2.00E-08 


691 


U95102 


Xenopus laevis 
mitotic 

phosphoprotein 
90 mRNA, 
complete cds 


8.00E-09 


<NONE> 


<NONE> 


<NONE> 


692 


ABO 17026 


Mus musculus 
mRNA for 

oxysterol-binding 
protein, complete 
cds 


0 


OXYB_RABIT 


OXYSTEROL- 
BINDING PROTEIN 


1 .00E-34 


693 


U95098 


Xenopus laevis 
mitotic 

phosphoprotein 
44 mRNA, partial 
cds 


6.00E-04 


UF02_MAIZE 


FLA VONOL 3-0- 
GLUCOSYLTRANS 
FERASE (EC 
2.4.1.91) 


3.1 


694 


U95102 


Xenopus laevis 
mitotic 

phosphoprotein 
90 mRNA, 
complete cds 


5.00E-04 


<NONE> 


<NONE> 


<NONE> 


695 


U34954 


Caenorhabditis 
elegans 
cyclophilin 
isoform 10 


5.00E-24 


CYPA_CAEEL 


PEPTIDYL-PROLYL 
CIS-TRANS 
ISOMERASE 10 (EC 
5.2.1.8) 


2.00E-29 


696 


AB01 1 167 


Homo sapiens 
mRNA for 
KIAA0595 
protein, partial 
cds 


0 


RFX5_HUMAN 


BINDING 

REGULATORY 

FACTOR 


2.1 


697 


U03886 


Human GS2 
mRNA, complete 
cds. 


2.00E-28 


SKDl_MOUSE 


SKD1 PROTEIN 


4.00E-17 


698 


AF086275 


Homo sapiens full 
length insert 
cDNA clone 
ZD45C02 


3.00E-41 


SPT7_YEAST 


TRANSCRIPTIONAL 
L ACTIVATOR SPT7 


0.82 


699 


U95102 


Xenopus laevis 
mitotic 

phosphoprotein 
90 mRNA, 
complete cds 


3.00E-10 


CA1E_HUMAN 


COLLAGEN ALPHA 
1(XV) CHAIN 
PRECURSOR 


1.1 
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Nearest Neighbor (BlastN vs. Genbank) 


Nearest Neighbor (BlastX vs. Non-Redundant Proteins) 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P VALUE 


ACCESSION 


DESCRIPTION 


P VALUE 


700 


U95I02 


Xenopus laevis 
mitotic 

phosphoprotein 
90 mRNA, 
complete cds 


4.00E-1 1 


E434_ADECC 


Q65962 canine 
adenovirus type 1 
(strain ell), early e4 31 
kd protein. 11/98 


4.4 


701 


LI 7340 


Drosophila 
melanogaster 
germline 
transcription 
factor gene, 
comnlete cds 


3.3 


CISY_TETTH 


CITRATE 
SYNTHASE, 
MITOCHONDRIAL 
PRECURSOR (EC 
4.1.3.7) (14 NM 
FILAMENT- 
FORMING 
PROTEIN) 


9.7 


702 


X58170 


M.musculus 
mRNA for t- 
Complex Tcp-lOa 


2.00E-45 


PME2JLYCES 


PECTINESTERASE 
2 PRECURSOR (EC 
3.1.1.11) (PECTIN 
METHYLESTERASE 

1 T 11^/ 1111 1^11— t^J 1 U1V 1 

)(PE2) 


7.4 


703 


Z96207 


H.sapiens 
telomeric DNA 
sequence, clone 
12PTEL049,read 
12PTELOO049.se 

q 


8.00E-08 


<NONE> 


<NONE> 


<NONE> 


704 


X58430 


Human Hoxl.8 
gene 


e-146 


HXAA_HUMAN 


HOMEOBOX 
PROTEIN HOX-A10 
(HOX-lH)(HOX-1.8) 
(PL) 


4.00E-05 


705 


U95094 


Xenopus laevis 
XL-INCENP 
(XL-INCENP) 
mRNA, complete 
cds 


6.00E-06 


YN39_SYNP7 


HYPOTHETICAL 9.2 
KD PROTEIN IN 
CYST-CYSR 
INTERGENIC 
REGION (ORF 81) 


0.89 


706 


U95094 


Xenopus laevis 
XL-INCENP 
(XL-INCENP) 
mKNA, complete 
cds 


1.00E-11 


MYSH_BOVIN 


MYOSIN I HEAVY 
CHAIN-LIKE 
PROTEIN (MIHC) 
(BRUSH BORDER 
MYOSIN I) (BBMI) 


0.001 


707 


M 19961 


Human 
cytochrome c 
oxidase subunit 
Vb (coxVb) 
mRNA, complete 
cds. 


e-123 


OTHU5B 


<NONE> 


3.00E-30 
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Nearest Neighbor (BlastN vs. Genbank) 


Nearest Neighbor (BlastX vs. Non-Redundant Proteins) 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P VALUE 


ACCESSION 


DESCRIPTION 


P VALUE 


708 


X68380 


M.musculus gene 
tor catnepsin D, 
exon 3 


5.00E-04 


42_MOUSE 


ERYTHROCYTE 

MfcMtJKAINt 

PROTEIN BAND 4.2 
(P4.2) (PALLIDIN) 


9.9 


709 


U95102 


Xenopus laevis 
mitotic 

phosphoprotein 
90 mRNA, 
complete cds 


1.00E-11 


TCPADROME 


T-COMPLEX 
PROTEIN 1 , ALPHA 
SUBUNIT(TCP-1- 
ALPHA) 


4.3 


710 


U95102 


Xenopus laevis 
mitotic 

phosphoprotein 
90 mRNA, 
complete cds 


3.00E-10 


<NONE> 


<NONE> 


<NONE> 


711 


U95094 


Xenopus laevis 
XL-INCENP 
(XL-INCENP) 
mRNA, complete 
cds 


4.00E-12 


<NONE> 


<NONE> 


<NONE> 


712 


U95102 


Xenopus laevis 
mitotic 

phosphoprotein 
90 mRNA, 
complete cds 


0.002 


<NONE> 


<NONE> 


<NONE> 


713 


ABO 18323 


Homo sapiens 
mRNA for 
KIAA0780 
protein, partial 
cds 


3.00E-41 


LBR__CHICK 


LAMIN B 
RECEPTOR 


3.4 


714 


U95102 


Xenopus laevis 
mitotic 

phosphoprotein 
90 mRNA, 
complete cds 


6.00E-06 


YM8L_YEAST 


HYPOTHETICAL 
71.1 KD PROTEIN 
IN DSK2-CAT8 
INTERGENIC 
REGION 


3.00E-08 


715 


U95102 


Xenopus laevis 
mitotic 

phosphoprotein 
90 mRNA, 
complete cds 


4.00E-13 


PSC^DROME 


POSTERIOR SEX 
COMBS PROTEIN 


0.6 


716 


L28101 


Homo sapiens 
kallistatin (PI4) 
gene, exons 1-4, 
complete cds 


7.00E-07 


IRKX_RAT 


INWARD 
RECTIFIER 
POTASSIUM 
CHANNEL BIR9 
(KIR5.1) 


5.4 
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Nearest Neighbor (BlastN vs. Genbank) 


Nearest Neighbor (BlastX vs. Non-Redundant Proteins) 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P VALUE 


ACCESSION 


DESCRIPTION 


P VALUE 


717 


AC001038 


Homo sapiens 
(subclone 2 h2 
from PI H49) 
DNA sequence 


8.00E-09 


MGMT.YEAST 


METHYLATED- 

DNA--PROTEIN- 

CYSTEINE 

METHYLTRANSFE 

RASE 


0.48 


718 


U95094 


Xenopus laevis 
XL-INCENP 
(XL-INCENP) 
mRNA, complete 
cds 


1.00E-11 


YWDE_BACSU 


HYPOTHETICAL 
19.9 KD PROTEIN 
IN SACA-UNG 
INTERGENIC 
REGION 
PRECURSOR 


1.8 


719 


U01139 


Mus musculus 
B6D2F1 clone 
2CllBmRNA. 


e-110 


GSC_DROME 


HOMEOBOX 

PROTEIN 

GOOSECOID 


7.2 


720 


ABO 17430 


Homo sapiens 
mRNA for 
kinesin-like DNA 
binding protein, 
complete cds 


0 


YBAV_ECOLI 


HYPOTHETICAL 
12.7 KD PROTEIN 
IN HUPB-COF 
INTERGENIC 
REGION 


0.17 


721 


U95094 


Xenopus laevis 
XL-INCENP 
(XL-INCENP) 
mRNA, complete 
cds 


0.001 


CPCF_SYNP2 


PHYCOCYANOBILI 
N LYASE BETA 
SUBUNIT (EC 4.-.-.-) 


2.4 


722 


U95102 


Xenopus laevis 
mitotic 

phosphoprotein 
90 mRNA, 
complete cds 


9.00E-10 


<NONE> 


<NONE> 


<NONE> 


723 


U95102 


Xenopus laevis 
mitotic 

phosphoprotein 
90 mRNA, 
complete cds 


0.04 


YKK7_CAEEL 


HYPOTHETICAL 
54.9 KD PROTEIN 
C02F5.7 IN 
CHROMOSOME III 


0.057 


724 


U95094 


Xenopus laevis 
XL-INCENP 
(XL-INCENP) 
mRNA, complete 
cds 


8.00E-08 


H5_CAIMO 


HISTONE H5 


0.39 


725 


U95094 


Xenopus laevis 
XL-INCENP 
(XL-INCENP) 
mRNA, complete 
cds 


3.00E-09 


DED1_YEAST 


PUTATIVE ATP- 
DEPENDENT RNA 
HELICASE DED1 


0.5 
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Nearest Neighbor (BlastN vs. Genbank) 


Nearest Neighbor (BlastX vs. Non-Redundant Proteins) 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P VALUE 


ACCESSION 


DESCRIPTION 


P VALUE 


726 


J04617 


Human elongation 
factor EF-1 -alpha 
gene, complete 
cds. > :: 

dbj|E02629|E0262 
9 DNA of human 
polypeptide chain 
elongation factor- 
1 alpha 


5.00E-36 


ALU7_HUMAN 


!!!! ALU 

SUBFAMILY SQ 

WARNING ENTRY 
mi 

* » • • 


0.84 


727 


X54859 


Porcine TNF- 
alpha and TNF- 
beta genes for 
tumour necrosis 
factors alpha and 
beta, respectively. 


3.3 


Z165_HUMAN 


ZINC FINGER 
PROTEIN 165 


5.6 


728 


D49911 


Thermus 
thermophilus 
UvrA gene, 
complete cds 


0.014 


CC48_CAPAN 


CELL DIVISION 
CYCLE PROTEIN 48 
HOMOLOG 


9.9 


729 


U95098 


Xenopus laevis 
mitotic 

phosphoprotein 
44 mRNA, partial 
cds 


2.00E-06 


CA25_HUMAN 


PROCOLLAGEN 
ALPHA 2(V) CHAIN 
PRECURSOR 


0.011 


730 


D15057 


Human mRNA 
for DAD-1, 
complete cds 


0 


DAD1_HUMAN 


DEFENDER 
AGAINST CELL 
DEATH 1 (DAD-1) 


8.00E-16 


731 


U95098 


Xenopus laevis 
mitotic 

phosphoprotein 
44 mRNA, partial 
cds 


6.00E-06 


ANFD_RHOCA 


NITROGENASE 
IRON-IRON 
PROTEIN ALPHA 
CHAIN (EC 1.18.6.1) 
(NITROGENASE 
COMPONENT I) 
(DINITROGENASE) 


9.6 


732 


U95098 


Xenopus laevis 
mitotic 

phosphoprotein 
44 mRNA, partial 
cds 


7.00E-07 


EFTUCHLVI 


ELONGATION 
FACTOR TU (EF- 
TU) 


2.5 


733 


AB018335 


Homo sapiens 
mRNA for 
KIAA0792 
protein, complete 
cds 


0 


TRYM_RAT 


MAST CELL 
TRYPTASE 
PRECURSOR (EC 
3.4.21.59) 


5.6 
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Nearest Neighbor (BlastN vs. Genbank) 


Nearest Neighbor (BlastX vs. Non-Redundant Proteins) 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P VALUE 


ACCESSION 


DESCRIPTION 


P VALUE 


734 


X98743 


H.sapiens mRNA 

lUr r\.l N/\ IlCllCaoC 

(Myc-regulated 
dead box protein) 


0.04 


<NONE> 


<NONE> 


<NONE> 


735 


U95098 


Xenopus laevis 
mitotic 

phosphoprotein 
44 mRNA, partial 
cds 


2.00E-07 


<NONE> 


<NONE> 


<NONE> 


736 


Z49314 


S.cerevisiae 
chromosome X 
reading frame 
ORF YJL039C 


3.2 


<NONE> 


<NONE> 


<NONE> 


737 


D 12646 


Mouse kif4 
mRNA for 
microtubule- 
based motor 
protein KIF4, 
complete cds 


0 


KIF4_MOUSE 


KINESIN-LIKE 
PROTEIN KIF4 


2.00E-76 


738 


J04038 


Human 

glyceraldehyde-3- 

phosphate 

dehydrogenase 


2.00E-47 


SDC1_HUMAN 


SYNDECAN-1 
PRECURSOR 
(SYND1) (CD138) 


3.5 


739 


AF010238 


Homo sapiens 
von Hippel- 
Lindau tumor 
suppressor 


1.00E-09 


LIN1_HUMAN 


LINE-1 REVERSE 
TRANSCRIPTASE 
HOMOLOG 


0.001 


740 


U95102 


Xenopus laevis 
mitotic 

phosphoprotein 
90 mRNA, 
complete cds 


2.00E-06 


YQJX_BACSU 


HYPOTHETICAL 
13.2 KD PROTEIN 
IN GLNQ-ANSR 
INTERGENIC 
REGION 


9.9 


741 


L21186 


Human lysyl 
oxidase-like 
protein mRNA, 
complete cds. 


e-145 


OXRTL 


<NONE> 


1.00E-34 


742 


U95094 


Xenopus laevis 
XL-INCENP 
(XL-INCENP) 
mRNA, complete 
cds 


2.00E-05 


CC48_SOYBN 


CELL DIVISION 

CYCLE PROTEIN 48 

HOMOLOG 

(VALOSIN 

CONTAINING 

PROTEIN 

HOMOLOG) (VCP) 


7.6 
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Nearest Neighbor (BlastN vs. Genbank) 


Nearest Neighbor (BlastX vs. Non-Redundant Proteins) 


SEQ 
ID 




DESCRIPTION 


P VALUE 


ACCESSION 


DESCRIPTION 


P VALUE 


743 


AF009203 


Homo sapiens 
YAC clone 
377A1 unknown 
mRNA 
3 f untranslated 
region 


3.3 


<NONE> 


<NONE> 


<NONE> 


744 


Z74894 


S.cerevisiae 
chromosome XV 

reading frame 

ORF YOL152w 


0.12 


CD14_RABIT 


Q28680 oryctolagus 
cuniculus (rabbit), 
monocvte 

differentiation antigen 
cdl4 precursor. 11/98 


1.9 


745 


U95094 


Xenopus laevis 
XL-INCENP 
(XL-INCENP) 
mRNA, complete 

cds 


9.00E-10 


KIN3_YEAST 


SERINE/THREONIN 
E-PROTEIN KINASE 
KIN3 (EC 2.7.1.-) 


2.5 


746 


U95102 


Xenopus laevis 
mitotic 

phosphoprotein 
90 mRNA, 
complete cds 


2.00E-05 


YA53_SCHPO 


HYPOTHETICAL 
24.2 KD PROTEIN 
C13A11.03IN 
CHROMOSOME I 


7.00E-17 


747 


S61044 


ALDH3=aldehyd 
e dehydrogenase 
isozyme 3 
[human, stomach, 
mRNA Partial, 
1362 nt] 


0 


DHAP_HUMAN 


ALDEHYDE 
DEHYDROGENASE, 

DIMERIC NADP- 
PREFERRING (EC 
1.2.1.5) (CLASS 3) 


2.00E-71 


748 


U 95094 


Xenopus laevis 
XL-INCENP 
(XL-INCENP) 
mRNA, complete 
cds 


2.00E-08 


CA1E_CHICK 


COLLAGEN ALPHA 
1(XIV) CHAIN 
PRECURSOR 
(UNDULIN) 


0.36 


749 


U95102 


Xenopus laevis 
mitotic 

phosphoprotein 
90 mRNA, 
complete cds 


7.00E-06 


<NONE> 


<NONE> 


<NONE> 


/ JV 


L14815 


Entamoeba 

uiMui yiicd. nivi- 

1:IMSS galactose- 
specific adhesin 
170kD subunit 
(hgl3) gene, 
complete cds. 


0.12 


<NONE> 

* 


<NONE> 


<NONE> 


751 


X63785 


T.thermophila 
gene for snRNA 


1.1 


<NONE> 


<NONE> 


<NONE> 
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Nearest Neighbor (BlastN vs. Genbank) 


Nearest Neighbor (BlastX vs. Non-Redundant Proteins) 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P VALUE 


ACCESSION 


DESCRIPTION 


P VALUE 






U2-2 










752 


M83756 


Mytilus edulis 
mitochondrial 
NADH 

dehydrogenase 
subunit 5 (ND5) 
gene, 3' end; 
NADH 

dehydrogenase 
subunit 6 (ND6) 
gene, complete 
cds; and 

cytochrome b (cyt 
b), 5' end. 


0.042 


DSC1_HUMAN 


DESMOCOLLIN 
1A/1B PRECURSOR 
(DESMOSOMAL 
GLYCOPROTEIN 
2/3)(DG2/DG3) 


2.6 


753 


AB001066 


Brown trout 
microsatellite 
DNA sequence 


0.38 


IMB3_HUMAN 


IMPORTIN BETA-3 
SUBUNIT 
(KARYOPHERIN 
BETA-3 SUBUNIT) 


1.2 


754 


AF064787 


Lotus japonicus 
rac GTPase 
activating protein 
1 mRNA, 
complete cds 


0.51 


<NONE> 


<NONE> 


<NONE> 


755 


U20608 


Dictyostelium 
discoideum 
unknown spore 
germination- 
specific protein- 
like protein, orfl, 
orf2 and orf3 
genes, complete 
cds 


0.043 


<NONE> 


<NONE> 


<NONE> 


756 


M77812 


Rabbit myosin 
heavy chain 
mRNA, complete 
cds. 


1.2 


RBL1HUMAN 


RETINOBLASTOM 
A-LIKE PROTEIN 1 
(107 KD 

RETINOBLASTOM 
A-ASSOCIATED 
PROTEIN) (PRB1) 
(PI 07) 


4.9 


757 


X63789 


T.thermophila 
genes for snRNA 
U5-l,snRNAU5- 
2 


0.058 


<NONE> 


<NONE> 


<NONE> 
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Nearest Neighbor (BlastN vs. Genbank) 


Nearest Neighbor (BlastX vs. Non-Redundant Proteins) 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P VALUE 


ACCESSION 


DESCRIPTION 


P VALUE 




D50646 


Mouse mRNA for 
SDF2, complete 
cds 


2.00E-27 


PMT3_YEAST 


DOLICHYL- 
PHOSPHATE- 
MANNOSE-- 
PROTEIN 

MANNOSYLTRANS 
FERASE 3 (EC 
2.4.1.109) 


0.002 


759 


L81583 


Homo sapiens 
(subclone 3__g2 
from PI Hll) 
DNA sequence 


3.00E-19 


ALU5_HUMAN 


!!!! ALU 

SUBFAMILY SC 
WARNING ENTRY 

till 

ft # • • 


0.86 


760 


U95102 


Xenopus laevis 
mitotic 

phosphoprotein 
90 mRNA, 
complete cds 


2.00E-06 


SYFA_YEAST 


PHENYLALANYL- 
TRNA 

SYNTHETASE 

ALPHA CHAIN 
CYTOPLASMIC 


5.7 


761 


AF000370 


Homo sapiens 
polymorphic CA 
dinucleotide 
repeat flanking 
region 


6.00E-89 


APPl_MOUSE 


AMYLOID-LIKE 
PROTEIN 1 
PRECURSOR 
(APLP) 


5.7 


762 


U95098 


Xenopus laevis 
mitotic 

phosphoprotein 
44 mRNA, partial 
cds 


0.002 


<NONE> 


<NONE> 


<NONE> 


763 


U95102 


Xenopus laevis 
mitotic 

phosphoprotein 
90 mRNA, 
complete cds 


7.00E-06 


PSF_HUMAN 


PTB-ASSOCIATED 
SPLICING FACTOR 
(PSF) 


0.72 


764 


ABO 18288 


Homo sapiens 
mRNA for 
KIAA0745 
protein, partial 
cos 


0 


TC2A_CAEBR 


TRANSPOSABLE 
ELEMENT TCB2 
TRANSPOSASE 


1.5 


765 


AF020282 


Dictyostelium 
discoideum 
DG2033 gene, 
partial cds 


0.38 


PMT2_YEAST 


DOLICHYL- 
PHOSPHATE- 
MANNOSE-- 
PROTEIN 

MANNOSYLTRANS 
FERASE 2 (EC 
2.4.1.109) 


0.18 
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Nearest Neighbor (BlastN vs. Genbank) 


Nearest Neighbor (BlastX vs. Non-Redundant Proteins) 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P VALUE 


ACCESSION 


DESCRIPTION 1 


P VALUE 


766 


AF017357 


Oryza sativa low 
moiecuiar eany 

light-inducible 
protein mRNA, 
complete cds 


038 


RGS3HUMAN 


REGULATOR OF G- 

SIGNALLING 3 
(RGS3) (RGP3) 


0.23 


767 


U67599 


Methanococcus 
jannaschii section 
141 of 150 of the 
complete genome 


0.13 


<NONE> 


<NONE> 


<NONE> 


768 


X74178 


B.taurus 

microsatellite 

DNAINRA153 


0.13 


FAG1_SYNY3 


P73574 synechocystis 
sp. (strain pec 6803). 
3-oxoacyl-[acyl- 
carrier protein] 
reductase 1 (ec 
1.1.1.100) (3- 
ketoacyl- acyl carrier 
protein reductase 1). 
11/98 


5.00E-16 


769 


AF041858 


Mus musculus 
synaptojanin 2 
isoform delta 
mRNA, partial 
cds 


0.043 


CA44_HUMAN 


COLLAGEN ALPHA 
4(IV) CHAIN 
PRECURSOR 


0.24 


770 


JO 1404 


Drosophila 
melanogaster 
mitochondrial 
cytochrome c 

oxidase subunits, 
ATPase6, 7 
tRNAs (Trp, Cys, 
Tyr, Leu(UUR), 
Lys, Asp, Gly) 
genes, and 
unidentified 
reading frames 
A61, 2 and 3. 


0.021 


NU1M_CITLA 


NADH- 
UBIQUINONE 
OXIDOREDUCTASE 
CHAIN 1 (EC 1 .6.5.3) 


7.2 


771 


AL022317 


Human DNA 
sequence from 
clone 140L1 on 
chromosome 

22ql3.1-13.31, 
complete 

sequence [Homo 
sapiens] 


3.00E-41 


ALU7_HUMAN 


!!!! ALU 

SUBFAMILY SQ 
WARNING ENTRY 

(Ml 

• • • . 


4.00E-08 
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Nearest Neighbor (BlastN vs. Genbank) 


Nearest Neighbor (BlastX vs. Non-Redundant Proteins) 


SEO 
ID 


ACCESSION 


DESCRIPTION 


P VALUE 


ACCESSION 


DESCRIPTION 


P VALUE 


772 


U95094 


Xenopus laevis 
XL-INCENP 
(XL-INCENP) 
mRNA, complete 
cds 


1.00E-09 


<NONE> 


<NONE> 


<NONE> 


773 


AF095927 


Rattus norvegicus 
protein 

phosphatase 2C 
mRNA, complete 
cds 


0 


P2C_PARTE 


PROTEIN 

PHOSPHATASE 2C 
(EC 3.1.3.16) (PP2C) 


1.00E-16 


774 


X87212 


H.sapiens mRNA 
for cathepsin C 


0 


CATCHUMAN 


DIPEPTIDYL- 
PEPTIDASE I 
PRECURSOR (EC 
3.4.14.1) 


2.00E-46 


775 


X05283 


Drosophila 
melanogaster 
PKCG7 gene 
exons 7-14 for 
protein kinase C 


4.5 


<NONE> 


<NONE> 


<NONE> 


776 


X03558 


Human mRNA 
for elongation 
factor 1 alpha 
subunit 


0 


EF11_HUMAN 


ELONGATION 
FACTOR 1 -ALPHA 1 
(EF-1 -ALPHA- 1) 


1.0OE-83 


777 


X06960 


Aspergillus 

nidulans 

mitochondrial 

DNA for 

cytochrome 

oxidase subunit 3, 

tRNA-Tyr 


0.23 


<NONE> 


<NONE> 


<NONE> 


778 


U95102 


Xenopus laevis 
mitotic 

phosphoprotein 
90 mRNA, 
complete cds 


3.00E-09 


YMT8_YEAST 


HYPOTHETICAL 
36.4 KD PROTEIN 
INNUP116-FAR3 
INTERGENIC 
REGION 


5.00E-07 


779 


U95102 


Xenopus laevis 
mitotic 

phosphoprotein 
90 mRNA, 
complete cds 


2.00E-07 


NAT1_YEAST 


N-TERMINAL 

ACETYLTRANSFER 

ASE1(EC2.3.1.88) 


5.00E-23 


780 


U59706 


Gallus gallus 
alternatively 
spliced AMPA 
glutamate 
receptor, isoform 
GluR2 flop, 


0.014 


PPOL_SARPE 


POLY (ADP- 
RIBOSE) 

POLYMERASE (EC 
2.4.2.30) (PARP) 


0.021 
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Nearest Neighbor (BlastN vs. Genbank) 


Nearest Neighbor (BlastX vs. Non-R.edundant Proteins) 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P VALUE 


ACCESSION 


DESCRIPTION 


P VALUE 






(GluR2) mRNA, 
partial eels. 










781 


U57391 


Rattus norvegicus 
FceRI gamma- 
chain interacting 
protein SH2-B 
(SH2-B) mRNA, 
complete cds 


1.00E-84 


<NONE> 


<NONE> 


<NONE> 


782 


ABO 14591 


Homo sapiens 
mRNA for 
KIAA0691 

nrotein ervmnlete 

cds 


7.00E-57 


SSGP_VOLCA 


SULFATED 
SURFACE 
GLYCOPROTEIN 
185 CSSG 185) 


5.3 


783 


AJ008065 


Chrysolina bankii 
1 6S rRNA gene, 
mitotype B2 


0.043 


<NONE> 


<NONE> 


<NONE> 


784 


AF067212 


Caenornabditis 
elegans cosmid 
F37F2 


0.005 


MEKl_RAT 


MAPK/ERK KlNAbb 

KINASE 1 (EC 2.7.1.- 
) (MEK KINASE 1) 




785 


U95094 


Xenopus laevis 
XL-INCENP 
(XL-INCENP) 
mRNA, complete 
cds 


0.042 


<NONE> 


<NONE> 


<NONE> 


786 


U95102 


Xenopus laevis 
mitotic 

phosphoprotein 
90 mRNA, 
complete cds 


9.00E-09 


<NONE> 


<NONE> 


<NONE> 


787 


Y13401 


Homo sapiens 
CD3 delta gene, 
enhancer 
sequence 


8.00E-08 


<NONE> 


<NONE> 


<NONE> 


788 


AE001038 


Archaeogiobus 
fulgidus section 
69 of 172 of the 
complete genome 


0.13 


<NONE> 


<NONE> 


<NONE> 


789 


U95102 


Xenopus laevis 
mitotic 

phosphoprotein 
90 mRNA, 


2.00E-06 


<NONE> 


<NONE> 


<NONE> 
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Table 2 





Nearest Neighbor (BlastN vs. Genbank) 


Nearest Neighbor (BlastX vs. Non-Redundant Proteins) 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P VALUE 


ACCESSION 


DESCRIPTION 


P VALUE 






complete cds 










790 


AF041463 


Manihot esculenta 
elongation factor 
1 -alpha 


1 A 

1.4 


<NONb> 


<NUJNb> 




791 


U95102 


Xenopus laevis 
mitotic 

phosphoprotein 
90 mRNA, 
complete cds 


0.002 


HXA3_HAEIN 


HEME:HEMOPEXIN 

OTVrrYTM/"! DDfYTCTXT 

-dINDIJNu rKUlt/llN 

PRECURSOR 


2.7 


792 


71^1 n 
L\l\ \Z 


pwbi j A cosmici 
vector DNA 




Jr R W A_ i JriJbC U 


ru 1 Al i Vt 

SERINE/THREONIN 
E-PROTEIN KINASE 
PKWA (EC 2.7.1.-) 


9 OOP flA 


793 


U85193 


Human nuclear 
factor I-B2 

complete cds 


4.00E-44 


<NONE> 


<NONE> 


<NONE> 


794 


U89331 


Human 

pseudoautosomal 
homeodomain- 

rnntfiinitiff nrotpin 

vUlllClIlllllci LHV/Lwlll 

(PHOG) mRNA, 
complete cds 


7.00E-06 


NRL_HUMAN 


NEURAL RETINA- 
SPECIFIC LEUCINE 
ZIPPER PROTEIN 


6.3 


795 


AF055666 


Mus musculus 
kinesin light chain 
2 (Klc2) mRNA, 
complete cds 


0.52 


PSPD_BOVIN 


PULMONARY 
SURFACTANT- 
ASSOCIATED 
PROTEIN D 
PRECURSOR 


0.33 


796 


L13321 


Homo sapiens 
iduronate-2- 
sulfatase (IDS) 
gene, exon 1 , 
incomplete 5' end. 


0.14 


YRP2_YEAST 


HYPOTHETICAL 
84.4 KD PROTEIN 
IN RPC2/RET1 
3'REGION 


0.27 


797 


ALO 10270 


Plasmodium 

falciparum DNA 

*** 

SEQUENCING 
IN PROGRESS 
*** from contig 
4-96, complete 
sequence 


0.37 


YTH3_CAEEL 


HYPOTHETICAL 
75.5 KD PROTEIN 

C14A4.3 IN 
CHROMOSOME II 


2 
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Table 2 





Nearest Neighbor (BlastN vs. Genbank) 


Nearest Neighbor (BlastX vs. Non-Redundant Proteins) 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P VALUE 


ACCESSION 


DESCRIPTION 


P VALUE 


798 


U95098 


Xenopus laevis 
mitotic 

phosphoprotein 
44 mRNA, partial 
cds 


0.015 


IMB3_HUMAN 


IMPORTIN BETA-3 
SUBUNIT 
(KARYOPHERIN 
BETA-3 SUBUNIT) 


0.063 


799 


U70139 


Mus musculus 
putative CCR4 
protein mRNA, 
partial cds 


0 


CCR4_YEAST 


GLUCOSE- 

REPRESSIBLE 

ALCOHOL 

DEHYDROGENASE 

TRANSCRIPTIONA 

L EFFECTOR 

(CARBON 

CATABOLITE 

REPRESSOR 

PROTEIN 4) 


5.00E-11 


800 


L26507 


Mouse myocyte 
nuclear factor 
(MNF) mRNA, 
complete cds. 


3.00E-41 


MNF_MOUSE 


MYOCYTE 
NUCLEAR FACTOR 
(MNF) 


4.00E-18 
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Table 2 



^ : , 


Nearest Neighbor (BlastN vs. Genbank) 


Nearest Neighbor (BlastX vs. Non-Redundant Proteins) 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P VALUE 


ACCESSION 


DESCRIPTION 


P VALUE 


801 


U20527 


Mus musculus 
chemokine KC 
gene, 5' region. 


0 


GRO_MOUSE 


GROWTH REGULATED 
PROTEIN PRECURSOR 
(PLATELET-DERIVED 
GROWTH FACTOR- 
INDUCIBLE PROTEIN 
KC) (SECRETORY 
PROTEIN N51) 


1.00E-28 


802 


AF065482 


Homft ^anien^ 

sorting nexin 2 
(SNX2) mRNA, 
complete cds 


o 


MYSA DROM 
~E 


MYOSIN HEAVY 
CHAIN, MUSCLE 


0.089 


803 


U05823 


Mus musculus 
pericentrin mRNA, 
complete cds. 


1 .00E-94 


M84D_DROME 


MALE SPECIFIC SPERM 
PROTEIN MST84DD 


0.099 


804 


U67468 


Methanococcus 
iannaschii section 
10 of 150 of the 
complete genome 


0.4 


<NONE> 


<NONE> 


<NONE> 


805 


U14178 


Human type II IL-1 
receptor gene, exon 
IB 


1.00E-19 


AMPH HUMA 
N 


AMPHIPHYSIN 


2.9 


806 


L40411 


Homo sapiens 

thyroid receptor 
interactor 


0 


TRI8_HUMAN 


THYROID RECEPTOR 
INTERACTING PROTEIN 
8 (TRIP8) 


4.00E-86 


807 


D17218 


Human HepG2 3' 
region Mbol cDNA, 
clone hmd3g02m3 


e-136 


CA1A_HUMAN 


COLLAGEN ALPHA 1 (X) 
CHAIN PRECURSOR 


3.00E-04 


808 


Z57610 


H.sapiens CpG 
DNA, clone 187al0, 
reverse read 
cpgl87al0.rtla. 


e-102 


HN3B_MOUSE 


HEPATOCYTE 
NUCLEAR FACTOR 3- 
BETA (HNF-3B) 


1 .00E-24 


809 


D14678 


Human mRNA for 
kinesin-related 
protein, partial cds 


0 


NCD_DROME 


CLARET 

SEGREGATIONAL 
PROTEIN 


1.00E-70 
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Table 2 





Nearest Neighbor (BlastN vs. Genbank) 


Nearest Neighbor (BlastX vs. Non-Redundant Proteins) 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P VALUE 


ACCESSION 


DESCRIPTION 


P VALUE 


810 


X56317 


Xiphophorus 
maculatus 
Xmrk(proto- 
oncogene) gene for 
receptor tyrosine 
kinase. 


0.49 


WNlB_MOUSE 


WNT-10B PROTEIN 
PRECURSOR (WNT- 1 2) 


7.2 


811 


M36200 


Human 

synaptobrevin 1 
(SYB1) gene, exon 
5. 


0.2 


VE2_HPV14 


REGULATORY PROTEIN 
E2 


3.1 


812 


M18157 


Human glandular 
kallikrein gene, 
complete cds. 


1.5 


EKLF_MOUSE 


ERYTHROID 
KRUEPPEL-LIKE 
TRANSCRIPTION 
FACTOR (EKLF) 


1.1 


813 


D25215 


Human mRNA for 
KIAA0032 gene, 
complete cds 


1.9 


YXIS_SACER 


HYPOTHETICAL 28.9 
KD PROTEIN IN XIS 
5'REGION (ORF1) 


1.3 


814 


M96628 


Human gene 
sequence, 5' end. 


2.00E-06 


AGRI_DISOM 


AGRIN (FRAGMENT) 


9.5 


815 


Z57610 


H.sapiens CpG 
DNA, clone 187al0, 
reverse read 
cpgl87al0.rtla. 


e-102 


HN3B_MOUSE 


HEPATOCYTE 
NUCLEAR FACTOR 3- 
BETA (HNF-3B) 


1.00E-19 


816 


X14168 


Human pLC46 with 
DNA replication 
origin 


5.00E-16 


ZN44_HUMAN 


ZINC FINGER PROTEIN 
44 (ZINC FINGER 
PROTEIN KOX7) 


1.6 


817 


Ml 9262 


Rat clathrin light 
chain (LCB3) 
mRNA, complete 
cds. 


0.28 


LMA_DROME 


LAMININ ALPHA 
CHAIN PRECURSOR 


4.7 


818 


AF058055 


Mus musculus 
monocarboxylate 
transporter 1 


0.2 


<NONE> 


<NONE> 


<NONE> 


819 


AB014570 


Homo sapiens 

mRNA for 

KIAA0670 protein, 
partial cds 


0.16 


YGR1_YEAST 


HYPOTHETICAL 34.8 
KD PROTEIN IN SUT1- 
RCK1 INTERGENIC 
REGION 


4.00E-06 
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Table 2 





Nearest Neighbor (BlastN vs. Genbank) 


Nearest Neighbor (BlastX vs. Non-Redundant Proteins) 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P VALUE 


ACCESSION 


DESCRIPTION 


P VALUE 


820 


Ml 9262 


Rat clathrin light 
chain (LCB3) 
mRNA, complete 
cds. 


0.27 


LMA_DROME 


LAMININ ALPHA 
CHAIN PRECURSOR 


4.5 


821 


Z54367 


H.sapiens gene for 
plectin 


0.29 


Y093_CAEEL 


HYPOTHETICAL 58.5 
KD PROTEIN T20B 12.3 
IN CHROMOSOME III 


1.00E-14 


822 


ABO 17026 


Mus musculus 
mRNA for 

protein, complete 
cds 


0 


OXYB HUMA 
N 


OXYSTEROL-BINDING 
PROTEIN 


2.00E-49 


823 


X58170 


M.musculus mRNA 
for t-Complex Tcp- 
10a gene 


l.OOE-20 


TIT T T ft 1 r 1 1 

UL52_HSV1 1 


DNA 

HELICASE/PRIMASE 
COMPLEX PROTEIN 
(DNA REPLICATION 
PROTEIN UL52) 


5.3 


824 


X58430 


Human Hoxl.8 
gene 


0 


HXAA HUMA 
N 


HOMEOBOX PROTEIN 

HOX-AIO(HOX-IH) 

(HOX-1.8)(PL) 


1 .00E-44 


825 


X53754 


Porcine 

sarcoplasmic/endopl 
asmic-reticulum 
Ca(2+) pump gene 2 
3'-end region 


1.3 


<NONE> 


<NONE> 


<NONE> 


826 


AB005786 


Arabidopsis thaliana 
tRNA-Glu gene 


0.46 


<NONE> 


<NONE> 


<NONE> 


827 


AB012130 


Homo sapiens 
SBC2 mRNA for 
sodium bicarbonate 
cotransporter2, 
complete cds 


1.9 


<NONE> 


<NONE> 


<NONE> 


828 


ABO 17430 


Homo sapiens 
mRNA for kinesin- 
like DNA binding 
protein, complete 
cds 


0 


YBAV_ECOLI 


HYPOTHETICAL 12.7 
KD PROTEIN IN HUPB- 
COF INTERGENIC 
REGION 


0.063 
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Table 2 





Nearest Neighbor (BlastN vs. Genbank) 


Nearest Neighbor (BlastX vs. Non-Redundant Proteins) 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P VALUE 


ACCESSION 


DESCRIPTION 


P VALUE 


829 


AB007886 


Homo sapiens 
KIAA0426 mRNA, 
complete cds 


0.042 


YDF3_SCHPO 


PROBABLE 
EUKARYOTIC 
INITIATION FACTOR 
C17C9.03 


0.52 


830 


AB018335 


Homo sapiens 
mRNA for 
KIAA0792 protein, 
complete cds 


e-172 


UROT_BOVlN 


TISSUE PLASMINOGEN 
ACTIVATOR 
PRECURSOR (EC 
3.4.21.68) 


0.86 


831 


D 12646 


Mouse kif4 mRNA 
for microtubule- 
based motor protein 
K.IF4, complete cds 


0 


KIF4_MOUSE 


KINESIN-LIKE PROTEIN 
KIF4 


9.00E-96 


832 


U38376 


Rattus norvegicus 
cytosolic 

phospholipase A2 
mRNA, complete 
cds 


0.048 


<NONE> 


<NONE> 


<NONE> 


833 


L40411 


Homo sapiens 
thyroid receptor 
interactor 


0 


TRI8_HUMAN 


THYROID RECEPTOR 
INTERACTING PROTEIN 
8 (TRIP8) 


4.00E-86 


834 


T TH0 1 1 A 


mus museums 
RNA1 homolog 
(Fugl)mRNA, 
complete cds. 


o«UUr>U4 


vxnin VC A CT 

I JN W /_YfcAal 


UVDnTUCTir A T £Q Q 

ri i rvJ 1 rlii 1 lw\L Oo.o 

KD PROTEIN IN URE2- 
SSU72 INTERGENIC 
REGION 




835 


D50646 


Mouse mRNA for 
SDF2, complete cds 


1.00E-40 


YB64_YEAST 


HYPOTHETICAL 57.2 
KD PROTEIN IN MET8- 
HPC2 INTERGENIC 
REGION 


4.9 


836 


D50646 


Mouse mRNA for 
SDF2. comolete cds 


1 .00E-40 


YB64_YEAST 


HYPOTHETICAL 57.2 
KD PROTEIN IN MET8- 
HPC2 INTERGENIC 
REGION 


4.9 


837 


U67459 


Methanococcus 
jannaschii section 1 
of 150 of the 
complete genome 


5.00E-05 


GCS1_HUMAN 


MANNOSYL- 
OLIGOSACCHARIDE 
GLUCOSIDASE (EC 
3.2.1.106) 


9.2 
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Table 2 





Nearest Neighbor (BlastN vs. Genbank) 


Nearest Neighbor (BlastX vs. Non-Redundant Proteins) 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P VALUE 


ACCESSION 


DESCRIPTION 


P VALUE 


838 


U18657 


Haemophilus 
influenzae LeuA 
(leuA) gene, partial 
cds, DprA (dprA+), 
orf272and orfl93 
genes, complete cds, 
and PfkA (pfkA) 
gene, partial cds. 


0.01 


STE6_YEAST 

- 


MATING FACTOR A 
SECRETION PROTEIN 
STE6 (MULTIPLE DRUG 
RESISTANCE PROTEIN 
HOMOLOG) (P- 
GLYCOPROTEIN) 


7 


839 


U12523 


Rattus norvegicus 
ultraviolet B 
radiation-activated 
UV98 mRNA, 
partial sequence. 


1.00E-10 


YMT8_YEAST 


HYPOTHETICAL 36.4 
KD PROTEIN IN 
NUP116-FAR3 
INTERGENIC REGION 


2.00E-06 


840 


D78255 


Mouse mRNA for 
PAP-1, complete 
cds 


e-175 


<NONE> 


<NONE> 


<NONE> 


841 


D 17263 


HumanHepG2 3" 
region Mbol cDNA, 
clone hmd5f07m3 


1.00E-58 


<NONE> 


<NONE> 


<NONE> 


842 


AF006751 


Homo sapiens 
ES/130mRNA, 
complete cds 


0.061 


YRP2_YEAST 


HYPOTHETICAL 84.4 
KD PROTEIN IN 
RPC2/RET1 3'REGION 


2.00E-07 


843 


U67459 


Methanococcus 
jannaschii section 1 
of 150 of the 
complete genome 


6.00E-05 


YC14.METJA 


HYPOTHETICAL 
PROTEIN MJ1214 


8.1 


844 


D88689 


Mus musculus 
mRNA for flt-1, 
complete cds 


0.084 


ICP0_HSV2H 


TRANS-ACTING 
TRANSCRIPTIONAL 
PROTEIN ICP0(VMW1 18 
PROTEIN) 


0.014 
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r^iiixtpi* 


("Mnnpx in 


CIoiim 




ID 


Libl 


Lib2 


M00001340B:A06 


17062 


3 


0 


M00001340D:F10 


11589 


2 


2 


M00001341A:E12 


4443 


10 


6 


M00001342B:E06 


39805 


2 


0 


M00001343C:F10 


2790 


7 


15 


M00001343D:H07 


23255 


3 


0 


M00001345A:E01 


6420 


8 


0 


M00001346A:F09 


5007 

xV V V 9 


4 


8 


M00001346D:E03 


6806 


5 


2 


M00001346D:G06 

4 ™ X V V V V X x** * W * r » WW 


5779 


5 


4 


M00001346D:G06 


5779 

XV # f -V^ 


5 


4 


M00001347A-B10 

XT X W W V \J X *S 1 1 4 V • J**f A W 


13576 

A xv* xV / W 


5 


0 


M00001348B:B04 


16927 


4 


0 


M00001348B-G06 


16985 

A. V/ _X X_# 


4 


0 


M00001349B:B08 


3584 


5 

X— « 


11 


M00001350A:H01 

A T A x^ x# X^ -A «^ XV V» A A. * M. ■*- X«^ J. 


7187 

• Xj^ « 


5 


3 


M00001351B-A08 


3162 


10 

J. W 


14 


M00001351B:A08 


3162 


10 


14 


M00001352A:E02 

X ~ *■ xV x^ X» A xV *V mm A Mm- • -Am^ a*w 


16245 


4 


0 


M00001353A:G12 


8078 


4 


3 


M00001353D:D10 


14929 


4 


0 


M00001355B:G10 


14391 


3 


1 


M00001357D:D11 

A • A x^ x^ X^ Xr A » r 9 m 


4059 


8 


6 


M00001361A:A05 


4141 


5 


2 


M00001361D:F08 


2379 


26 


13 


M00001362B:D10 


5622 


7 


4 


M00001362C:H11 


945 


9 


21 


M00001365C:C10 

A * A X^ X^ X^ \J -X X^ ^ i r % r X X^ 


40132 

■ V * X-* MB* 


2 


0 


M00001370A:C09 


6867 


7 


3 


M00001371C:E09 


7172 


3 


5 


M00001376B:G06 


17732 


1 


3 


M00001378B:B02 


39833 


2 


0 


M00001379A:A05 


1334 


27 


38 


M00001380D-B09 


39886 


2 


o 


M00001382C:A02 


22979 


2 


i 


MOO001383A:CO3 


39648 


2 


0 


M00001383A:C03 


39648 


2 


0 


M00001386C:B12 


5178 


5 


5 


M00001387A:C05 


2464 


5 


19 


M00001387B:G03 


7587 


6 


2 


M00001388D:G05 


5832 


10 


3 


M00001389A:C08 


16269 


3 


0 



Clones in Clones in Clones in Clones in 



Lib3 


Lib4 


Lib8 


Lib9 


0 


0 


0 


0 


1 


3 


3 


8 


2 


6 


3 


11 


0 


0 


1 


0 


13 


14 


6 


0 


1 


1 


0 


0 


2 


0 


1 


0 


3 


6 


2 


6 


1 


2 


0 


3 


3 


4 


0 


0 


3 


4 


0 


0 


0 


0 


12 


11 


0 


2 


0 


0 


0 


0 


0 


0 


5 


0 


0 


2 


1 


0 


1 


0 


1 


6 


6 


5 


1 


6 


6 


5 


0 


0 


0 


0 


1 


0 


1 


0 


0 


1 


23 


16 


0 


0 


0 


0 


8 


16 


0 


1 


10 


16 


4 


27 


4 


2 


2 


3 


2 


13 


1 


2 


2 


1 


0 


0 


0 


0 


3 


0 


0 


0 


0 


0 


1 


2 


0 


1 


5 


0 


1 


4 


0 


0 


0 


0 


35 


28 


3 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


4 


2 


5 


2 


25 


16 


1 


0 


1 


0 


0 


0 


0 


1 


5 


0 


0 


0 


1 


1 
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Table 5 All Differential Data for Libs 1 -4 and 8-9 



Clone Name 


Cluster 


Clones in 


Clones in 


Clones in 


Clones in 


^1 Am An 

Clones in 


Clones in 




111 


JL1D1 


T iK7 






IjIDo 


ijioy 


mooooi ^Q4A -F01 


OJOJ 


7 


7 




7 


o 


0 


maaaai ^o^ a *ph^ 

MUUUU 1 .I^Uj 


4A1 £ 


.? 


14 


A 
U 


0 


o 


A 
U 


MOOO0 1 'SQfiA •ro'* 


4000 




A 

• 


1^ 

1 J> 


«/ 


4 


10 


Ufififld 1 407 A -FOK 




7 
Z 


u 


u 


A 
U 


A 
\J 


A 


MUUUU 1 4U / r> .iJ 1 1 


DD DO 


0 
0 


1 

1 


j 


A 
U 


7 
z 


A 
U 


m aa a a 1 /lAor* -n i 7 
muuuu 1 4uy\~, xj\l 


Q^77 




Z 


A 
U 


1 
1 


1 1 
1 1 


1 7 
1Z 


MUUUU 1 4 1 UA .UU / 


/UUj 


o 
0 


Z 


A 
U 


A 
U 


A 
U 


A 
U 


MUUUU 1 4 1 ZD .D 1 U 


OJJ 1 


A 


/i 


A 
U 


J 


A 
U 


A 
U 


maaaai 41 ^ a «i4ft£ 

MUUUU 141 jA.riUO 


1 JJJO 




u 


A 
U 


A 
U 


y 


1 
1 


aaaaaa i a\ a a .uni 

MUUUU 141 OA .nu 1 


HCJ1A 
/O /4 




z 


A 
U 




A 
U 


A 
U 


MAAAA1 /1 1 .LI1 1 

MUUUU 141 Or>. HI 1 


oo4 / 


4 


i 
i 




A 

u 


0 


1 
I 


1V/IAAAA 1 /1 1 7 A -I7A7 
MUUUU 141 /A.JdUZ 




Z 


U 


A 
U 


1 

1 


A 
U 


A 
U 


\yfAAAA 1 /i 1 ct> -etai 
MUUUU 141 od .r U3 


00^7 


4 


z 


1 
1 


1 

1 


A 
U 


A 
U 


A/IAaaai a 1 gtvda£ 
MUUUU 141 oU.tSUo 


QC7/C 
OJZO 


"J 


z 


1 
1 


r 

J 


1 
1 


A 
U 


\yfAAAA1 A 91 rMTAI 

MUUUU 1 4Z 1 C .r U 1 


yjfl 


D 


z 


A 
U 


1 


1 1 


1 7 

lz 


\yfAAAA1 >40'2'D .CA7 

MUUUU 1 4z3 Jt> :bU / 


1 jUoo 


4 


A 

u 


A 


A 


A 
U 


A 
U 
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6908 

UZ70 


9 


4 


c 

J 


J 


1 

1 


o 


MOOO0 1 67QP -F0 1 


7R001 

/ OU7 1 


1 
i 


o 


0 


0 


o 


o 

VJ 


mooooi 67orvno3 






9 


A 
U 


i 
l 


0 
u 


1 

1 


MOOOOI 67QF)-n03 

1V1\J\J\J\J L\J / yLs . LJKJD 


107^1 


J 


7 


A 
U 


i 
i 


o 


1 

1 


MOOO0 1 680D-F08 

iVlvjvjvJU lKJO\JLJ.r\JO 


10S30 
i yjjjy 


9 


1 
I 


1 
1 


0 
u 


1 

1 


o 


MOOOOI 689C-R 17 




4 


o 

VJ 


A 
vJ 


A 

V 


0 


o 


MOOO0 1 686 A -F06 


4622 


7 
/ 


VJ 


4 


7 
z 


J 


o 

VJ 


MOOO0 1 688P 'FOQ 
ivivjvjvjv/ i uoov .r \/7 




u 


7 


O 


7 
z 


o 


J 


MOOOOI 693C-G01 


4^01 


1 0 

i \f 


(i 


7 




i 
i 


1 
l 


MOOOOI 71 6D-H(V5 

iviv/v vjvj i / i vjjlj .11 \ju 


677S2 


1 
i 


o 

VJ 


o 


1 
1 


o 


o 


M00003741D-C09 


40108 

*T\/ 1 WO 


2 


o 


o 


o 


0 

VJ 


o 

VJ 


M00003747D-C05 


1 1476 


6 


o 


o 


o 

u 


o 

VJ 


o 


M000037S9R-R09 


607 


76 


^9 


^0 


77 
/z 


91 

Z 1 


10 


M00003762OB08 


17076 

1 / VJ / VJ 


4 


o 

VJ 


0 


0 


o 


o 

VJ 


M00003763A-F06 


3108 


14 

i t 


1 1 

1 1 


1 


j 


o 

VJ 


1 

1 


MO00O3774r , *AO3 


67007 


l 


u 


U 


A 

u 


A 
U 


A 
U 


MO00O3796C:DO5 


5619 


3 


5 


3 


3 


0 


4 


M00003826B:A06 


11350 


3 


3 


0 


0 


1 


0 


M00003833A:E05 


21877 


2 


1 


0 


0 


0 


1 


M00003837D:A01 


7899 


5 


4 


0 


2 


1 


0 


M00003839A:D08 


7798 


5 


2 


2 


0 


0 


1 


M00003844C:B11 


6539 


8 


3 


0 


0 


0 


1 
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Table 5 All Differential Data for Libs 1-4 and 8-9 



Clone JMame 


Cluster 


Clones in 


Clones in 


Clones m 


Clones in 


Clones in 


clones in 




LSJ 


T ih1 

JL/1UJL 


T ih2 




T ih4 


IjIUO 


TJb9 


M00003846B-D06 


6874 


6 


3 


0 

V 


0 


o 


o 


moooo3R51 R-nio 


13595 


4 


o 

V7 


1 
i 


o 


o 


1 
i 


M00003853A-D04 


5619 

*/VJ I 7 


3 


5 




3 


0 


4 


M00003853A-F12 

ivi\j\j\j\j jojJA.r 


10515 




1 

1 


0 


1 
i 


1 

1 


2 


moooo3R56r-po2 


4672 


7 




4 


9 

z 


3 

ml 


o 


M00003R57 A *G 1 0 

LVi\J\J\J\J JOJ / / V.VJ I v 


33R9 

JJ07 


4 


1 1 


13 


2 

z 


o 


o 


M00003857 A -H03 


471 R 


4 
• 






2 

z 


4 


6 


M00003R71 PF02 


4573 


5 


7 


9 

Ld 




o 


1 
i 


M00003R7SR-F04 


19977 

l ZZ7 / / 




0 


o 


0 


o 


0 


moooo3R7sr-fo4 

ivi vjvjvjvj .7 o / jD.rv/T 


1 9077 

1 Z;7 / / 




o 


\7 


A 
VJ 


o 


o 


iv4oooo^R7^p*no7 


R47Q 


A 


'I 


1 
1 


1 

X 


9 

z 


4 


mooooi R76i>f 1 9 


770R 




9 


z. 


A 
VJ 


o 

VJ 


1 
1 


moooo3R70r-pi 1 




7 


1 

1 


7 




£ 

VJ 


97 

Z- / 


moooo3 R7QR 'Ti 1 n 

IVIIJVJVJVJ J O / 7D .JL/ 1 VJ 


^1 ^R7 

D 1 JO / 


i 
I 


1 

1 


A 
U 


U 


1 

1 


o 

VJ 


\40ooo3R7orv a 09 

iVlUUVJVJJO lyU.rWJL. 


14S07 


'I 
-> 


1 

1 


A 
U 


A 
U 


*1 

J 


1 
i 


M00003 rrs;p • A 09 

1V1 VJVJ V/ \J J o O .J ./Tl 


1 3S76 




n 
u 


A 


n 

VJ 


19 
1 z 


1 1 


1v400003rr^p*ao9 

1V1 VJVJ VJVJ J o o .j uz 


1 ^^76 




u 


A 
U 


A 
VJ 


19 
1 z 


1 1 


moooo^qo^p-f i n 

1 VI VJVJ VJVJ J !7UOv^.J_/ 1 VJ 


Q9RS 


/ 




A 


u 


i 
i 


9 
z 


1V1UUUUJ7U /JJ.AV7 


j7ouy 


1 
1 


n 
u 


A 


A 

u 


9 
z 


1 
I 


moooo3007f>-t404 

ivivjvjvjvj.ji'u / Ly.n.V/ t T 


1 f\*\ 1 7 

J. Oj 1 / 


-1 


A 

u 


A 
u 


A 

VJ 


A 
VJ 


o 

VJ 


moooo30ooi>po3 

ivivjvjvjvj j yxjyu .v_/U 


R679 

OU / L, 


4 


4 


A 
VJ 


A 
VJ 


o 

VJ 


o 

VJ 


M00003Q1 9RT)01 

ivivjuuvj j ~ 1 zr> . i_ju 1 


19539 


A 


1 


A 
VJ 


1 

1 


o 

VJ 


1 

I 


M00003 0 1 4P -F05 


3000 


0 




o 

o 


1 
1 


7 


13 


M00003999 A «F06 

i VI VJVJ VJVJ J 7ZrZr/\ iDUU 


93955 




o 


1 
1 


1 
1 


0 

VJ 


o 

VJ 


M0000395R A -H02 


1R957 




o 


A 
VJ 


A 
VJ 


o 


o 

VJ 


M0000395R A -H09 


1R057 




o 


A 
VJ 


A 
VJ 


o 

VJ 


o 

VJ 


M00003Q5RP-Prin 
iviv/wyj7jov^.\j 1 v/ 




z. 


0 


A 
VJ 


A 


A 
VJ 


A 
VJ 


M00003QSRP-G 1 0 

IVIVJVJVJVJ J ~ •JOv^.Vj 1VJ 




9 


u 


A 
VJ 


A 

VJ 


A 
VJ 


A 
VJ 


M00003968R-F06 

JVlvvvvJ 7\JOl_>.X \7V7 




9 


o 


1 
1 


4 
• 


A 
VJ 


A 

VJ 


M00003970P-R09 


40199 


9 

Li 


o 


A 
VJ 


A 
VJ 


o 

VJ 


o 

VJ 


M00003974D-F07 


93910 




o 


A 
VJ 


A 
VJ 


o 

VJ 


o 

VJ 


M00003974n-H09 


9335R 




o 


A 
VJ 


A 
VJ 


1 
i 


o 

VJ 


M00003975A-Ci11 


12430 


4 


o 


o 

VJ 


0 

VJ 


o 

u 


o 

VJ 


M00003978RG05 


5693 


7 


4 


1 

1 




1 

1 


1 

1 


M00003981AE10 


3430 


0 

y 


10 


7 


1 


o 

VJ 


o 


M00003982CC02 


2433 

^~«7 J 


10 

Jt V/ 


13 


91 

Z> 1 


1R 
1 o 


o 


R 


M00003983AA05 


9105 




1 
i 


1 
i 


1 
i 


o 


o 

VJ 


M0000409RH* A 06 

1V1 VJVJ U Ut ux, O U . r\ VJ O 




A 


o 
o 


i 
i 


o 


1 
1 


A 
U 


M00004028D:C05 


40073 


2 


0 


i 


0 


0 


1 


M00004031A:A12 


9061 


5 


2 


0 


0 


0 


0 


M00004031A:A12 


9061 


5 


2 


0 


0 


0 


0 


M00004035C:A07 


37285 


2 


0 


0 


1 


0 


1 


M00004035D:B06 


17036 


4 


0 


0 


0 


0 


0 


M00004059A:D06 


5417 


10 


4 


0 


9 


2 


0 
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Table 5 All Differential Data for Libs 1-4 and 8-9 



Clone Name 


Cluster 


Clones in 


Clones in 


Clones m 


Clones m 


Clones in 


clones m 






JuIUJL 


T ih9 

JL/1UZ 






Jul Uo 


T ihO 


M00004068B-A01 


3706 


7 


14 


4 


22 


1 


0 


M00004079R-R05 


17016 


4 


0 

V/ 


0 

\J 


0 


0 


0 


M0000408 1 PT) 1 0 


15069 


7 
j 


0 


0 

u 


1 

1 


0 


0 


M0000408 1 C'D 1 2 


14391 


7 
_* 


1 

1 


0 

yJ 


0 


0 


0 


moooo4086t>go6 




4 


7 


0 


O 
\) 


1 

1 


2 


M00004087D- A 0 1 


6R80 


9 




1 

1 


1 
1 


0 


0 

V 


\40ooo4007n«R i 9 

1V1UUUU4V:7jL/.D 1Z 


5795 




D 


7 

z 


0 


9 
z 


1 

1 


moooo4007t>r 1 9 


5795 






9 
z 


0 


9 


1 

1 


M00004 1 05P' A 04 


7991 
/ZZ1 




9 
z 


7 
Z 


9 
Z 


0 


A 
KJ 


M000041 or a *fo6 


4077 


4 


0 




1 

1 


7 


1 
1 


m 00004 1 1 1 r> a or 

1V1UUUI/4 ill L/./VUo 


6R74 
Do /4 


0 


J 


0 


0 


0 


0 


moooo4i iap-fi i 

iviuuuu4i i4cr 1 1 


1 71 87 


9 
z 


1 
J 


0 
u 


7 
/ 


0 

U 


1 
1 


M00004 1 7RR-H09 


1 7979 

1 J A f z 


7 


9 
z 


0 




0 


A 


M00004 1 46P-P 1 1 


5957 
jZJ / 


z 


0 


J 


c 




95 
Zj 


\/roooo4i fvror 

IV1UUUU4 1 J 1 L/.DVJO 


1 6077 


4 


A 


A 

u 


A 
U 


A 
U 


A 


M00004 1 57P* A HQ 


6455 

OH J J 




1 
1 


0 


A 
U 


0 


0 


M00004 1 aqp-p 1 9 


•^710 


D 


0 
Z 


0 
0 


9 
Z 


9 
z 




\/T00004 171TVR07 
1V1UUUU41 / ILADUj 


4008 


0 


7 


9 
Z 


9 
Z 


9 
Z 


A 
U 


\/toooo41 79p*ri08 

1V1UUUU41 /ZL.L'vO 


1 1 404 


/l 
4 


0 


A 

U 


A 
U 


0 


0 
u 


\aoooo4i S7P-rw7 


1 6709 


j 


A 

u 


A 
U 


A 
u 


A 


A 
U 


X/T000041 R5P*P07 


1 1 447 




1 
1 


A 

u 


A 


0 


0 


M00004 1 Q7D-H0 1 


8910 

OZ 11/ 


9 
z 


u 


0 
u 


0 


0 


0 


moooo4907r*p 1 9 


147 1 1 
143 1 1 


4 
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u 


A 

u 


A 


1 

1 


9 

z 


M000049 1 9R*P07 


9770 
zj /y 


96 
ZO 


1 7 


4 


9 
Z 


9 
Z 




M000049 1 4P-H05 


1 1451 
1 1 4 j 1 


7 


z 


1 
1 


9 
Z 


1 

1 


1 

1 


M00004997 a -n 1 n 


1 60 1 8 
1 t>y 1 0 


A 
4 


A 
U 


A 


U 


A 


A 


moooo4997r*hoq 

iVi\J UU U4ZZ J D . UK) y 


/ Oryy 


»> 


4 


A 


9 
Z 


1 
1 


0 


moooo4997t>fo4 


19071 
1 zy / 1 


4 
4 


0 


A 


A 
U 


1 
1 


0 


moooo4990r*for 


6455 


'j 


1 

1 


0 


A 


0 


0 


moooo497or -P07 


7919 

/Z1Z 


7 


D 


9 

Z 


1 
1 




0 


M0000426QD- D06 


4Q05 


7 
/ 


u 


7 


1 
1 


7 


1 

1 


M00004975P-P1 1 


16014 

1 Uy 1*t 


7 


0 


0 
u 


1 
1 


0 


0 


M 000042 8 3 R • A 04 


14986 


7 


1 

1 


0 

u 


1 
1 


1 

1 


1 

1 


M00004285B-E08 


56020 


1 
1 


0 


A 
u 


0 
u 


A 


0 


M0000490-^n*F 1 9 


IO7Z 1 


4 


U 


U 


1 


z 


1 

1 


M00004296C:H07 


13046 


4 


1 


0 


1 


0 


0 


M00004307C:A06 


9457 


2 


0 


5 


0 


3 


0 


M00004312A:G03 


26295 


2 


0 


0 


0 


0 


0 


M00004318C:D10 


21847 


2 


1 


0 


0 


0 


0 


M00004372A:A03 


2030 


13 


10 


32 


4 


0 


0 


M00004377C:F05 


2102 


12 


20 


23 


21 


6 


5 
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Table 6 All Differential Data for Libs 15-20 



Clone Name Cluster ID Clones in Clones in Clones in Clones in Clones in Clones in 







Libl5 


Libl6b 


Libl7 


Libl8 


Libl9 


Lib20 


M00001340B:A06 


17062 


0 


0 


0 


0 


0 


0 


M00001340D:F10 


11589 


0 


0 


0 


0 


0 


0 


M00001341A:E12 


4443 


0 


0 


0 


1 


0 


0 


M00001342B:E06 


39805 


0 


0 


0 


0 


0 


0 


M00001343C:F10 


2790 


0 


0 


0 


0 


0 


0 


M00001343D:H07 


23255 


0 


0 


0 


0 


0 


0 


M00001345A:E01 


6420 


0 


0 


0 


0 


0 


0 


M00001346A:F09 


5007 


0 


0 


0 


0 


0 


0 


M00001346D:E03 


6806 


0 


0 


0 


0 


0 


0 


M00001346D:G06 


5779 


0 


0 


0 


0 


0 


0 


M00001346D:G06 


5779 


0 


0 


0 


0 


0 


0 


M00001347A:B10 


13576 


0 


0 


0 


0 


0 


0 


M00001348B:B04 


16927 


0 


0 


0 


0 


0 


0 


M00001348B:G06 


16985 


0 


0 


0 


0 


0 


0 


M00001349B:B08 


3584 


0 


0 


0 


0 


0 


0 


M00001350A:H01 


7187 


0 


0 


0 


0 


0 


0 


M00001351B:A08 


3162 


0 


1 


0 


0 


1 


0 


M00001351B:A08 


3162 


0 


1 


0 


0 


1 


0 


MO00O1352A:E02 


16245 


0 


0 


0 


0 


0 


0 


M00001353A:G12 


8078 


0 


0 


0 


0 


0 


0 


M00001353D:D10 


14929 


0 


3 


1 


0 


5 


0 


M00001355B:G10 


14391 


0 


0 


0 


0 


0 


0 


M00001357D:D11 


4059 


0 


0 


0 


0 


0 


0 


M00001361A:A05 


4141 


0 


0 


0 


0 


0 


0 


M00001361D:F08 


2379 


0 


0 


0 


0 


0 


0 


M00001362B-.D10 


5622 


0 


0 


0 


0 


0 


0 


M00001362C:H11 


945 


0 


0 


0 


0 


0 


1 


M00001365C:C10 


40132 


0 


0 


0 


0 


0 


0 


M00001370A:C09 


6867 


0 


0 


0 


0 


0 


0 


M00001371C:E09 


7172 


0 


0 


0 


0 


0 


0 


M00001376B:G06 


17732 


0 


0 


0 


0 


0 


1 


M00001378B:B02 


39833 


0 


0 


0 


0 


0 


0 


M00001379A:A05 


1334 


0 


0 


0 


0 


0 


1 


M00001380D:B09 


39886 


0 


0 


0 


0 


0 


0 


M00001382C:A02 


22979 


0 


0 


0 


0 


0 


0 


M00001383A:C03 


39648 


0 


0 


0 


0 


0 


0 


M00001383A:C03 


39648 


0 


0 


0 


0 


0 


0 


M00001386C:B12 


5178 


0 


0 


0 


0 


0 


0 


M00001387A:C05 


2464 


0 


0 


0 


0 


0 


0 


M00001387B:G03 


7587 


0 


0 


0 


0 


0 


0 


M000O1388D:GO5 


5832 


0 


0 


0 


0 


0 


0 


M00001389A:C08 


16269 


0 


1 


0 


0 


0 


0 


M00001394A:F01 


6583 


1 


4 


1 


0 


0 


0 


M00001395A:C03 


4016 


0 


0 


0 


0 


0 


0 
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Table 6 All Differential Data for Libs 15-20 














Clone fName 


v^iusier iu 


dunes in 


Clones in 


Clones in 


Clones in 


Clones in 


Clones i 








T ih15 

i^lUilj 


Libl6b 


Libl7 


Lib 18 


Libl9 


T •! f\ f\ 

Lib20 




M00001396A-C03 


4009 

l V v ^ 


o 

V 


0 


0 


A 

0 


A 

0 


A 

0 




M00001402A-E08 


39563 


o 

V 


0 


0 


A 

0 


A 
U 


A 
(J 




M00001407B-D11 


5556 


o 


0 


0 


A 

0 


A 

0 


A 

u 




M00001409C-D12 


9577 

-7<J / / 


o 

v 


0 


A 

0 


A 

0 


A 

0 


A 
U 




M00001410A-D07 


7005 

1 V V *^ 


o 

V 


0 


0 


0 


A 

0 


A 

0 




M00001412B-B10 

It X\J\J \J\J X I I J*XJ ml—} X V/ 


8551 


o 

V 


0 


0 


0 


A 

0 


A 

0 




M00001415A-H06 

lVXV/V/ W X X J/l. .X 


13538 


o 

V 


0 


0 


0 


0 


A 

0 




M00001416A*H01 


7674 


o 

V 


0 


0 


0 


0 


A 

0 




M00001416R-H1 1 

LVlKjyj \J\J 1 T 1 KJLJ .1111 


8847 




0 


0 


0 


A 

0 


A 

0 




M00001417A-E02 


36393 


o 

V 


0 


0 


0 


A 

0 


A 

0 




M00001418B-F03 

1V1VJVJ V/VJ lii OU .1 VJ.J 


9952 


V/ 


0 


0 


0 


0 


0 




M0000 1 4 1 8 D -R06 

lVlv/U Uv 1 *+ 1 OJ-/ .JUJVJVJ 


8S9fi 


n 

u 


0 


0 


0 


0 


0 


S***fl vj* 
* 


Mooooi49ir-F0i 


QS77 


n 

Vj 


0 


0 


0 


0 


0 




M00001 493R -F07 

1V1VJVJ V/VJ 1 T^JD .J_rfV / 


1 JV/VJU 


\J 


0 


0 


y^ 

0 


Ai 

0 


0 


r «: ■ 


M0000 1 424R -G09 

ivivjv/v/vj i*t^*ti_> . vj \j y 


10470 




0 


0 


y-v 

0 


A 

0 


A 

0 


*• ■ I: 
vj» *: 

* 


M00001 42SR-H0R 


991 CK 


KJ 


0 


0 


y^ 

0 


A 

0 


A 

0 




M0000 1 426D -COS 


4261 


VJ 


0 


1 


0 


0 


1 


i : £ 


M0000149RA-H10 

xvivjv/vvj 1 t T^.o/\ .n. i \j 


841 R9 


A 
KJ 


0 


0 


A 

0 


0 


0 


Hi! 


M00001 499A *H04 


9797 


o 


0 


0 


0 


0 


A 

0 




M0000149QR-A1 1 

IVIV/V/VJVJ 1 *T«w7X> ./A 1 1 


HUJ J 




0 


0 


A 

0 


A 

0 


A 

0 




M00001 429D-D07 

ivi\j\j\j\j i y Ly .Ly\J i 


403Q9 


n 


0 


0 


y^ 

0 


A 

0 


A 

0 


?*% 


M0000143Qr-F08 




n 

u 


0 


0 


0 


y-k 

0 


A 

0 


Pit j: 

< E 

hi 


MO000 1 442P 'D07 

iV l\J\J\J\J 1 ttiiV/ > i-J\J 1 


1 VJ / -? 1 


n 
u 


0 


0 


0 


y*v 

0 


A 

0 


* * 


M00001445A-F05 

lriV/vUV 1 TT J Ail V/«J 


13532 


n 

VJ 


0 


0 


yv 

0 


A 

0 


A 

0 


3 2 J* 

;* 


M00001446AF05 


7801 


n 
u 


0 


0 


yv 

0 


A. 

0 


A 

0 


iH 

^ P -I* 

szs 


M00001447A-G03 


10717 

1 VJ / 1 / 


o 

VJ 


0 


0 


#—1 

0 


0 


A 

0 


%i »t 

i 


MO00O 1 448D -POO 


0 
o 


1 

1 


6 


6 


1 


14 


1 




M00001448D-H01 

ITlVvVV X I 1 V i~J aX IV X 


36313 


n 

VJ 


3 


0 


0 


A 

3 


A 

0 




M00001449A-A12 


5857 


n 

VJ 


0 


0 


A. 

0 


0 


0 




M0000144QA-R12 

JVluVUu l*rt7A.D iZr 


*t 1UJJ 


VJ 


0 


0 


A. 

0 


0 


0 




MOOOOl 44Q A -n 1 2 


3681 




0 


A 

0 


0 


0 


0 




M00001449A-G10 


36535 

»/V/«J-/«J 


0 

VJ 


0 


0 


0 


0 


y»v 

0 




M00001449C-D06 


861 10 

OVJ 1 1 VJ 


o 

vr 


0 


A 

0 


A 

0 


A 

0 


A 

0 




M00001450A-A02 


39304 


o 

VJ 


0 


0 


0 


y"k 

0 


y^. 

0 




M00001450A-A11 


32663 

— ' i<VVj 


o 

V 


0 


0 


A 

0 


0 


0 




M00001450AB12 


82498 


o 

V 


0 


0 


Ai 

0 


A 

0 


A 

0 




M00001450A*D08 


27250 


o 

V 


0 


0 


A. 

0 


0 


0 




M00001452A-B04 


84328 


o 

VJ 


0 


0 


A 

0 


A 

0 


A 

0 




M00001452A-B12 


86859 


o 

V 


0 


0 


A\ 

0 


0 


A 

0 




M00001452A-D08 


1 120 

1 1 


VJ 


0 


0 


A 

0 


A 

0 


A 

0 




M00001452A:F05 


85064 


0 


0 


0 


yv 

0 


A 

0 


A 

0 




M00001452C:B06 


16970 


o 


0 


2 


0 


1 


0 




M00001453A:E11 


16130 


0 


u 


0 


A 

0 


0 


0 




M00001453C:F06 


16653 


0 


0 


0 


0 


0 


0 




M00001454A:A09 


83103 


0 


0 


0 


0 


0 


0 




M00001454B:C12 


7005 


0 


0 
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0 


0 


0 
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Table 6 All Differential Data for Libs 15-20 












\_-ione iiarne 






Clones in 


Clones in 


Clones in 


Clones in 


Ciones i 








Libl6b 


Libl7 


Libl8 


Libl9 


T * 1 X* XX 

Lib20 


M00001454D-G03 


689 

XX XX -X 


o 

XX 


1 


2 


A 


4 


2 


M00001455A-E09 

AVXX/XxVxV A 1 1 m.*m—J\JS 


13238 


o 


A 

u 


A 

U 


A 


A 
0 


A 


M00001455B-E12 

* ™ *■ v V XX XX X. 1 ^X w*S A^X m *- J A M 


13072 


o 


A 
U 


A 
U 


A 

0 


A 

u 


A 

0 


M00001455D-F09 


9283 


o 


A 


A 
U 


A 


A 
U 


A 

u 


M00001455D-F09 


9283 


o 


A 
U 


A 


A 


A 

u 


A 


M00001460A-F06 


2448 


o 


A 

0 


A 

0 


A 

0 


A 

0 


A 

0 


M00001460A-F12 


39498 


o 


0 


0 


0 


0 


0 


M00001461A-D06 

atx vu xx vx i r vx a i a. « a,/ v/v 


1531 

1 ~J -J X 


o 


0 


A 

0 


0 


A 

0 


A 

0 


M00001463CB11 

XTiWVfw X ~\x»x\_ ✓ « J—P 1 i. 


19 


2 


13 


13 


A 

0 


69 


1 A 

10 


M00001465A-B11 


10145 


o 


0 


fx 

0 


0 


0 


0 


M00001466A-E07 


4275 


o 


A 


A 

0 


A 

0 


A 

0 


A 

0 


M00001467A-B07 


38759 


o 


0 


0 


0 


0 


0 


M00001467A-D04 


195fi8 


n 

u 


A 

0 


0 


0 


0 


0 


M00001467A-D08 


16283 


o 

\j 


A 

0 


0 


XX 

0 


0 


XX 

0 


M00001467A-D08 




n 
u 


A 

0 


0 


0 


0 


0 


M00001467A-E10 


39442 


0 


0 


0 


0 


0 


0 


M00001468A-F05 


7589 

/ JO/ 




A 

0 


0 


0 


0 


XX 

0 




12081 


o 


A 

0 


0 


0 


0 


xx 

0 


M00001469A-H12 


19105 

XXX 


o 


0 


0 


XX 

0 


X*\ 

0 


0 


M00001 470 A *B 1 0 


10^7 




0 


0 


XX 

0 


xx 

0 


xx 

0 


M00001470A*C04 


39425 


o 


A 

0 


0 


XX 

0 


0 


xx 

0 


M00001471 A-R01 


J7T / O 




A 

0 


0 


0 


0 


xx 

0 


M00001481D-A05 


7985 




A 

0 


0 


XX 

0 


0 


xx 

0 


M00001490B-C04 


l o\jyy 




A 

0 


0 


XX 

0 


0 


XX 

0 


M00001494D*F06 


7206 


o 

KJ 


0 


0 


XX 

0 


0 


xx 

0 


M00001497A-G02 


2623 


KJ 


0 


0 


0 


XX 

0 


xx 

0 


M00001499B-A1 1 


105^9 


o 


0 


0 


XX 

0 


XX 

0 


A 

0 


M00001500A-C05 

iTiv v w x */ vx \x / a • x»^ v/ 


5336 


o 


0 


0 


XX 

0 


XX 

0 


0 


M00001500A-E11 

i'l vx V \7 v A *^ V/ V/ x A • Av A A 


2623 


o 


0 


0 


XX 

0 


XX 

0 


XX 

0 


M00001500C*E04 

a " *- xx v v/ v/ x %x \x \x ■ jl^x/ r 


9443 

-/ 1 1 X 


o 


A 

0 


0 


0 


0 


xx 

0 


M00001501D-C02 


9685 




A 

0 


XX 

0 


XX 

0 


xx 

0 


0 


M00001504C:A07 


10185 


o 


A 

0 


0 


0 


0 


xx 

0 


M00001504OH06 

IfivVvV A mJ XX^T Xw ll A \f\J 


6974 


o 


A 

0 


0 


0 


0 


0 


M00001504D-G06 


6420 


o 


A 


A 

0 


0 


0 


XX 

0 


M00001507A:H05 


39168 

^X 7 -X 7 A XX Vx 


o 


A 


0 


0 


0 


XX 

0 


M00001511A:H06 


39412 


o 


A 

u 


0 


0 


0 


0 


M00001512A-A09 

■* * * XX XX XX XX A %X A AM A • i Av \J _X 


39186 

•/X 1 \J\J 


o 


A 

0 


0 


0 


0 


xx 

0 


M00001512D:G09 

-* >^ XX 4- fcX A * ^ ^ ^^_x xx ^X 


3956 
*j y \j 


o 


A 

0 


1 

1 


0 


xx 

0 


xx 

0 


M00001513A:B06 


4568 


o 


A 

0 


0 


0 


0 


xx 

0 


M00001513C-E08 

XX X* XX XX X. «X A ^X Xk^X ■ ft -J Vr 


14364 

X "TJ V/™ 


o 


A 

0 


0 


XX 

0 


xx 

0 


x\ 

0 


M00001514C:D11 


40044 


0 


1 

1 


0 


0 


XX 

0 


xx 

0 


M00001517A:B07 


4313 


0 


0 


0 


0 


0 


0 


M00001518C:B11 


8952 


0 


a 
u 


u 


(J 


u 


A 

u 


M00001528A:C04 


7337 


0 


0 


0 


0 


0 


0 


M00001528A:F09 


18957 


0 


0 


0 


0 


0 


0 


M00001528B:H04 


8358 


0 


0 
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0 


0 


0 
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Table 6 All Differential Data for Libs 15-20 
















clones m 


Clones m 


Clones in 


Clones in 


Clones in 


Clones i 






JjJUl J 


Liblob 


Libl7 


LiblS 


Libl9 


Lib20 


M00001531A:D01 


38085 

•x Vx vx V 


o 


U 


A 
U 


A 


A 

0 


A 

0 


M00001532B:A06 


3990 

w" X XX 


1 

X 


I 


A 
0 


0 


0 


0 


M00001533A:C11 


2428 

M/ 1 VX 


o 




1 
1 


A 

0 


A 

0 


A 

0 


M00001534A:C04 


16921 

X XX -X xw X 


o 

Vx 


A 
U 


A 
U 


A 

0 


A 

0 


0 


M00001534A:D09 


5097 

*X XX -X 7 / 


o 

XX 


A 
U 


A 


A 

0 


0 


0 


M00001534A-F09 

* * NX KS XX ^* ^X 11 X • 4 VX ^X 


5321 

t/ XW JL 


o 


1 
1 


A 


A 

0 


2 


0 


M00001534C:A01 


4119 


o 


A 
U 


A 

0 


0 


0 


0 


M00001535A:B01 


7665 


o 


A 
U 


A 

u 


A 

0 


A 

0 


0 


M00001535A:C06 


20212 


o 

Vx 


A 
U 


A 
U 


A 

0 


A 

0 


0 


M00001535A:F10 


39423 

*X -X 1 XW 


o 


A 
U 


A 


A 

0 


A 

0 


0 


M00001536A:B07 


2696 

Xw XX _X^ XX 


o 


A 

u 


A 


A 

0 


3 


0 


M00001536A:C08 

■ %»» X^ XX AL fcX XX A & • VX VX 


39392 


o 


A 


A 

0 


0 


0 


0 


M00001537A-F12 

* T v v V -*. %X t X A « A. A ma* 


39420 


o 


0 


0 


0 


XX 

0 


0 


M00001537B:G07 


3389 


o 


0 


A 

0 


i-V 

0 


0 


0 


M00001540A-D06 

X » A V V V/ XX JL I XX 4 Jk • -X V/ Vx 


8286 


o 


A 

u 


A 

0 


0 


0 


0 


M00001541A-D02 

X ' A VX V XX XX J. I A i Jk • * -X Vx AW 


3765 




A 


0 


0 


0 


XX 

0 


M00001541A-F07 


22085 


n 


u 


0 


0 


0 


0 


M00001541A*H03 

* » V VX V XX A- *X 1 X J Jk • X l\f +s 


39174 


o 


0 


0 


0 


XX 

0 


0 


M00001542A-A09 

■* * * V V XX V A W i iwi k v I Jk V/ .X 


221 13 


o 


0 


0 


0 


A 

0 


0 


M00001542A:E06 


39453 


o 


U 


0 


0 


0 


A 

0 


M00001544A-E03 


12170 


o 

KJ 


0 


0 


XX 

0 


XX 

0 


0 


M00001544A-G02 


19829 


o 


0 


0 


XX 

0 


XX 

0 


^x 

0 


M00001544B:B07 


6974 


o 


0 


0 


0 


0 


0 


M00001545A-C03 


19255 


o 


0 


0 


/X 

0 


XX 

0 


0 


M00001545A-D08 


13864 


o 


0 


0 


0 


0 


0 


M00001546A-G11 


1267 


1 

1 


A 


0 


0 


7 


XX 

0 


M00001548A:E10 


5892 

*X XX -X Aw 


o 


A 

u 


0 


0 


0 


^x 

0 


M00001548A:H09 


1058 


o 


A 
U 


l 


0 


0 


#x 

0 


M00001549A:B02 


4015 

I Vx X 


o 


A 
U 


A 

0 


0 


0 


0 


M00001549A:D08 


10944 


o 


A 
U 


0 


0 


0 


^x 

0 


M00001549B:F06 


4193 


o 


u 


0 


0 


0 


0 


M00001549C:E06 


16347 


o 


A 

0 


0 


XX 

0 


XV 

0 


0 


MO0OO155OA-A03 


7239 


o 


A 


0 


XV 

0 


0 


0 


MO0OO155OA:GOl 


5175 

*x A § *J 


o 


A 


0 


0 


0 


>x 

0 


MO00O1551A:B10 

^ w X^ A %J 2\ X X * JLX X VX 


6268 


o 


A 


0 


0 


0 


0 


M00001551A:F05 


39180 

— ' -X XX XX 


o 


A 
U 


A 

0 


0 


0 


0 


M00001551A:G06 


22390 


o 


A 


0 


0 


0 


0 


M00001551C:G09 


3266 


o 


A 
U 


I 


0 


0 


/X 

0 


M00001552A:B12 


307 


o 


A 
U 


0 


0 


3 


xx 

0 


M00001552A:D11 


39458 

»^ X' ■ W XX 


o 


A 


0 


0 


XX 

0 


A 

0 


M00001552B:D04 


5708 


0 


1 
1 


A 

0 


0 


0 


XX 

0 


M00001553A:H06 


8298 


0 


0 


0 


0 


0 


0 


M00001553B:F12 


4573 


0 


o 


o 

KJ 


A 

w 


A 

u 


u 


M00001553D:D10 


22814 


0 


0 


0 


0 


0 


0 


M00001555A:B02 


39539 


0 


0 


0 


0 


0 


0 


M00001555A:C01 


39195 


0 


0 
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0 


0 


0 
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■ Innoi IVq rtt a. 
V/iyUC i^<IIIIt 


\_iiisicr lu 


Clones in 


Clones in 


Clones in 


Clones in 


Clones in 


Clones i 








Liblob 


Libl7 


Libl8 


Libl9 


Lib20 


M00001555D:G10 


4561 

r fcX vx x 


0 

XX 


A 
U 


U 


0 


0 


0 


M00001556A:C09 


9244 


o 

XX 


0 


A 

0 


A 
0 


0 


0 


M00001556A-F11 


1577 


o 


A 
U 


A 


0 


0 


0 


M00001556A:H01 


15855 

X %,X XX *-X fcX 


3 

teX 


r 


C 


0 


3 


1 


M00001556B-C08 

a » i\/ v v v x w +s vx Jm/ ■ x^/ xx vx 


4386 


1 

A 


Z 


A 

0 


0 


0 


0 


M00001556B:G02 


11294 

X X XW 1 


o 


A 


0 


0 


0 


0 


M00001557A-D02 

* xv v w xx A *x # x Jl ■ 1 -x V/ A* 


7065 


o 


0 


0 


0 


XX 

0 


xx 

0 


M00001557A-D02 


7065 


o 


A 


0 


0 


0 


XX 

0 


M00001557A-F01 

* » * VX VX VX V X J X X » A \^ X 


9635 

S \J ~J *J 


o 


A 
(J 


0 


0 


0 


XX 

0 


M00001557A-F03 

a * * v v xx v x ^ / 4. x m X VX *S 


39490 


o 


A 


0 


XX 

0 


XX 

0 


xx 

0 


M00001557B:H10 

3. » j. VX VX VX VJ X *^>' ^X I M^F #X X X- '-X 


5192 


o 


A 
0 


0 


0 


0 


0 


M00001557D-D09 

X T A V v V V/ X ^x / * fXV Xx ^ 


8761 


o 


0 


0 


0 


XX 

0 


xx 

0 


M00001558B-H11 

w X VX vx V XX X VX X -X^ • X X X X 


7514 


o 


0 


0 


XX 

0 


XX 

0 


xx 

0 


M00001560D-F10 


VX^/»^ o 




0 


0 


XX 

0 


xx 

0 


0 


M00001561A*C05 

X T x XX XX V V XI »#/ \/ X X JK • V_> V/v' 


39486 

~/ X~OV7 


o 


0 


0 


xx 

0 


XX 

0 


xx 

0 


M00001563B-F06 

X T *v \x V V X «x XX? ^X 1 * p X Vx xx^ 


102 


22 


38 


65 


7 


43 


-g xx 

10 


M00001564A*BP 


5053 


o 

V 


0 


1 


XX 

0 


.xx 

0 


0 


M00001571C-H06 

X T x Vf V V \/ X ^X / X • X XV/ V/ 


5749 


o 


0 


0 


0 


0 


0 


M00001578B-E04 


23001 


0 


0 


XX 

0 


0 


0 


0 


M00001579D-C03 


6539 


o 


0 


0 


0 


xx 

0 


0 


MOO0OI583D-A1O 

XT X XX V/ V» X/ X i-' U X^/ 7 «AX X V/ 


6293 


n 


0 


0 


0 


xx 

0 


0 


M00001586C-C05 

x ▼ x xx v V XX X wx Vx xx ^> ■ \v v»/ 


4623 


o 


0 


0 


0 


1 


0 


M00001587A-B11 


39380 




0 


XX 

0 


0 


0 


0 


M00001594B-H04 

X ▼ A XX V V V X »-/ -X^ • X^X « X lv I 


260 


A 
\j 


0 


xs 

0 


xx 

0 


1 


0 


M00001597C-H02 


4837 


o 


0 


0 


xx 

0 


0 


0 


M00001597D-C05 


10470 


0 

VJ 


0 


0 


xx 

0 


xx 

0 


0 


M00001598A-G03 


16999 

i \j y y j 


1 

1 


1 


1 


0 


XX 

0 


0 


M00001601A*D08 

x » x v v xx x xy w xx x * ' . xy 


2219A 

§ y~ 


n 


0 


0 


0 


0 


0 


M00001604A-B10 

■* ~ X W X^ V V X XX XX 1 X X t X %x 


1399 

a J y y 


o 


0 


xx 

0 


xx 

0 


XX 

0 


0 


M00001604A-F05 

x t x v/ vx XX vx X VvTiXi* \-J*x 


393Q1 

J y J j7 1 


o 

VJ 


0 


0 


0 


0 


0 


M00001607A-E11 

\x XX X XX XX / X X • * f x X 


1 1465 


o 


0 


XX 

0 


xx 

0 


0 


0 


MOO0016O8A-BO3 

x. ▼ xxx XX XX XX X XX XX VxX m* * fc -* y * 


7802 


o 


0 


XX 

0 


^x 

0 


0 


0 


M00001608B:E03 


22155 

x^.«w X *s +s 


o 


0 


0 


XX 

0 


xx 

0 


0 


M00001614C:F10 


13157 




A 

u 


0 


0 


XX 

0 


xx 

0 


M00001617C:E02 


17004 

x # xx Vx r 


o 


A 

0 


0 


0 


1 


0 


M00001619C:F12 


40314 


o 


A 
U 


0 


0 


0 


XX 

0 


M00001621C:C08 


40044 


o 


1 


0 


0 


XX 

0 


Xv 

0 


M00001623D:F10 


13913 

A mj y A a«r 


o 


A 


0 


0 


xx 

0 


Xv 

0 


M00001624A:B06 


3277 


o 


0 


XX 

0 


0 


0 


0 


M00001624C:F01 


4309 


o 


0 


0 


0 


0 


0 


M00001630B.H09 


5214 


1 


0 


0 


1 


1 


0 


M00001644C:B07 


39171 


0 


0 


0 


0 


0 


0 


M00001645A:C12 


19267 


0 


n 
u 


A 

u 


u 


1 


u 


M00001648C:A01 


4665 


0 


0 


0 


0 


0 


0 


M00001657D.C03 


23201 


0 


0 


0 


0 


0 


0 


M00001657D:F08 


76760 


0 


0 
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0 


0 


0 


0 
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Table 6 All Differential Data for Libs 15-20 



Clone Name 


Cluster ID 


Clones in 


Clones m 


Clones in 


Clones in 


Clones m 


Clone 








Liblob 


Liblv 


Lid18 


Libl9 


Lib, 


maaaa i ft^C'A no 

MUUUU 1 OOZl^.AUy 


9^91 R 
ZJZ 1 o 


A 
U 


A 

u 


A 
U 


A 
U 


A 
U 


A 
U 


maaaai ^7 a -paa 

1V1 UUU v J OO J A . E/U4 


/ * 4 \7A9 
J J /UZ 


A 
U 


A 
U 


A 

U 


A 

U 


A 

U 


A 

U 


m aaaa 1 ^or-fo? 


040o 


A 
U 


A 
U 


A 

u 


A 

U 


A 

U 


A 

u 


maaaa 1 ^ap-hyy? 

lYl UUUU I O / Uv^ .11 UZ 


14JO/ 


A 
U 


A 
U 


A 

u 


A 

u 


A 

u 


A 

u 


X/TAAAA 1 ^7^P»T4H0 
MUUUU 10/ Ji^.riUZ 


7A1 < 
/Ul J 


A 
U 


A 

U 


A 
U 


A 
U 


A 

u 


A 
U 


\/aaaa i £7^ a -pao 

MUUUU 10/ j A.v^uy 


R77T 
5/ /J 


A 
U 


A 

U 


A 
U 


A 
U 


A 

u 


A 

U 


\ >f AAAA 1 CHCJO .t?ac 

MUUUU I o /ojd :r U j 


1 1 yl £A 

1 14oU 


A 
U 


A 


A 

0 


A 

0 


A 
U 


A 
(J 


\/i a a aa 1 fnic -Pin 

MUUUU ID / /\^.lLI\J 


1 AA97 
1 40Z / 


A 
U 


1 
1 


A 

u 


A 
U 


A 

u 


A 

u 


\>f AAAA 1 £771*V A A7 
MUUUU 10// U.AU / 


7^7A 
/J /U 


A 

U 


A 
U 


A 


A 


A 


A 
U 


\/f aaaa i £72rvF 1 9 
iviuuuu 10/ oL/.r i z 


44 10 


A 

u 


A 

u 


A 
U 


A 

u 


A 
U 


A 
U 


\/f aaaa i £70 a • a a^ 

MUUUU 10/ yA.AUO 


ooou 


A 

u 


A 

u 


A 
U 


A 

u 


A 
U 


A 
U 


maaaa 1 ^70 a *f i a 

MUUUU 1 0 / y A .r 1 U 


Zoo / J 


A 

u 


A 
U 


A 
U 


A 

u 


U 


A 
U 


X/TAAA A 1 A7GT5 -17 A 1 

MUUUU 10/ yjt> .r U 1 


OZyo 


A 
U 


A 
U 


A 
U 


A 

u 


A 
U 


A 
U 


maaaai ^7or^«irni 
MUUUU 1 o /yt.r U 1 


70AA1 

/ouy i 


A 
U 


A 
U 


A 


A 

0 


A 


A 

0 


\4 aaaa i £7orvrw3 
muuuu l o / yu.uuj 


1 A7^1 

1U/J1 


A 
U 


A 
U 


A 


A 


A 


A 


a/aaaai ^TArvriAO 
MUUUU 1 o /yu.UVD 


1 AHf 1 
lU/51 


0 




A 

0 


0 


A 

0 


A 

0 


\>taaaa i £oafvt?aq 
MUUUU i OoUD.r Uo 


lUjjy 


A 
U 


A 
1) 


A 

u 


A 

0 


A 

0 


A 

0 


\aaaaai /coo/^.n io 
MUUUU 1 ooZC: Jt> I z 


1 /Ujj 


A 
U 


A 


A 

0 


A 

0 


A 

0 


A 

0 


\yf AAAA 1 ACA A -T?n/£ 

MUUUU 10ouA.t,UO 


40ZZ 


A 
U 


A 


A 

0 


A 

0 


A 

0 


A 

0 


\vJaaaa 1 aqq/^.itao 
MUUUU lOooL.rUy 


JJOZ 


A 


A 


A 

0 


A 

0 


A 

0 


A 

0 


\>taaaai AQiP'f^Ai 

MUUUU 1 Oy J L.UU 1 


4jyj 


A 
U 


A 
U 


A 

u 


A 


A 


A 


"maaaa i7i £tvu a< 
NLVvvv 1/1 oU.rlUD 


o /ZjZ 


A 
U 


A 
U 


A 

0 


A 

0 


A 


A 

0 


\yfAAAA77/1 1 n.r^o 

muuuu j /4 1 u.cuy 


4UlUo 


A 
U 


A 


A 

0 


A 

0 


A 

0 


A 

0 


\ yf A A A A "3 7/1 7 • A ^ 
MUUUUJ /4 /U.V/UD 


1 1 /1 7 A 
1 14 /O 


A 
U 


A 
U 


A 


A 

0 


A 

0 


A 


\AAAAA77COO .15 AA 

muuuuj /->yh>:h>uy 




A 
U 


0 


0 


0 


1 


0 


\4AOAA'?7^9P'RA8 
MUUUUJ /OZV^.UUo 


1 7A7A 
1 /U /O 


A 
U 


A 

u 


A 


A 


A 


A 


\/f AAAA77£7 A »I7A£ 

MUUUUJ /OJA.rUu 


7 1 AO 

jIUo 


A 
U 


A 
I) 


A 

0 


A 

0 


A 

0 


0 


N/fnAAA177AP' A f\1 
MUUUUJ / /4CAUJ 


A7QA7 

o/yu/ 


A 
U 


A 
U 


A 

u 


A 


A 

u 


A 


\A A AA A "3 7Q£P »nA^ 

muuuuj /you.uuj 


joiy 


A 
U 


A 
U 


A 

0 


A 

0 


A 


A 

0 


N/TAAAmO^T** A AA 
MUUUUJ oZOo . AUO 


1 1 JjU 


A 
U 


A 
U 


A 

0 


A 

0 


A 

0 


A 

0 


\/fAAAA^S^1 A «PA^ 
MUUUUJ OJ JA.EAjJ 


91 577 
Zl O / / 


A 
U 


A 
U 


A 

0 


A 

0 


A 

0 


A 

0 


N4AAAA'5Q > 57r>' A A1 
MUUUUJ OJ /JLJ.AU1 


/oyy 


A 
U 


u 


A 

0 


0 


0 


0 


N/TAAAA^R^Q A »nAR 
MUUUUJOJ "A. L/Uo 


770 8 

/ /yo 


A 

u 


A 

u 


A 

u 


A 


A 


A 

0 


X^AAAAQ Cil /l 1 1 
MUUUUJ 544t.D 1 1 


ojjy 


A 
U 


A 


0 


0 


0 


0 


1V1UUL/U J 04DD .L/UO 


Oo /4 


A 
U 


A 

u 


i 
1 


A 


A 

u 


A 


MOAAA^R^I R-FH A 
IVIUUUU J o J 1 1> ,XJ 1 u 


i j jyj 


A 
U 


A 

u 


A 
U 


A 

0 


A 

0 


A 

0 


IVIUUUUJ O J J r\..LJ\J L t 


joi y 


A 
U 


A 
U 


A 
U 


A 


A 


A 


MfiAAA'JR^'* A -F19 
MUUUUJoJ J A. r 1Z 


1 A^ 1 ^ 
1Uj i j 


A 

u 


A 
U 


A 

0 


A 

0 


A 

0 


A 

0 


IVIUUOUJOJOU.^UZ 


40ZZ 


A 
U 


A 
U 


A 


A 


A 

0 


A 

0 


Mnnoo/*R^7A -mo, 

iVlUUl/UJoJ / J\.\J 1U 


j joy 


A 
U 


A 

u 


A 

u 


A 

0 


A 


A 

0 


M00003857A:H03 


4718 


0 


0 


0 


0 


0 


0 


M00003871C-E02 


4573 


A 

V/ 


A 


A 

u 


A 

u 


u 


A 
U 


M00003875B:F04 


12977 


0 


0 


0 


0 


0 


0 


M00003875B:F04 


12977 


0 


< 

0 


0 


0 


0 


0 


M00003875C:G07 


8479 


0 


0 


0 


0 


0 


1 


M00003876D:E12 


7798 


0 


0 
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0 


0 


0 


0 
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Table 6 All Differential Data for Libs 15-20 



Clone Name 


Cluster ID 


Clones in 


Clones in 


Clones in 


Clones in 


Clones in 


Clones i 






Libl5 


Libl6b 


Libl7 


Libl8 


Libl9 


t ;i_oa 

LibzO 


MUUUUjo lyo.K^ 1 1 


5345 


A 

u 


A 

u 


A 
0 


2 


A 
0 


1 

1 


N/rnnnmsooR-rn n 

MUUUUJO /7D.U [\J 


J I JO / 


A 
U 


A 

u 


A 
U 


A 
0 


A 
U 


A 

u 


MUUUUjo /yU.JWZ 


1 a ^no 
I4jU / 


A 
U 


A 

0 


A 
0 


A 
0 


A 
U 


A 

u 


MUUUUJ oo2> ,/\ UZ 


uj /D 


U 


A 
U 


A 
U 


A 
0 


A 
0 


A 

u 


MUUUUjooOL'.AUZ 


1 J J /D 


A 

u 


A 
U 


A 
0 


A 
0 


A 
0 


A 
U 


x/innnmon/^f-r? 1 n 
MUUUUiyUoCb 1U 


yzo3 


0 


A 
U 


A 
0 


A 

0 


A 

0 


A 
0 


A./innnmonorv a no 
MUUUU j y\J / U. AUV 


3y©uy 


0 


A 
U 


A 

0 


A 

0 


A 
0 


A 
0 


MUUUU jyu /L/.riU4 


1 £.1 1 O 

163 1 / 


A 

0 


A 

0 


A 

0 


A 

0 


A 
0 


A 
0 


\/fnnnmonorvr , n'5 

Muuuujyuyjj.cuj 


OO /Z 


A 

0 


A 
0 


A 

0 


A 

0 


A 
0 


A 
0 


X/fnnnn^m od .thai 
MUUUUjyizr>:DUl 


1 OC50 

12532 


0 


A 

0 


A 

0 


A 

0 


A 

0 


A 

0 


A/iAnnA'2ni /Ip.eac 
MUUUU jy 14C:r 05 


T AAA 

3900 


0 


A 

0 


A 

0 


0 


l 


A 

0 


aaaaaao aoo a .t?a/c 

MUUUU3y22A:fcU6 


23255 


0 


A 

0 


0 


0 


A 

0 


A 

0 


MUUUU J y 5 o A :HU2 


18957 


0 


A 

0 


0 


0 


0 


A 

0 


X/fAAAAOACO A ,UA1 

MUUUU j y 5o A :HU2 


1 oAn 

18957 


0 


0 


0 


0 


0 


0 


TV yf AAAA'} OCOP.r 1A 

MUUUUjy3oC:(j lu 


a r\ a c c 

40455 


0 


A 

0 


0 


0 


0 


0 


X A AAAA1 ACOr'.r 1 A 

MUU003958C:U 10 


A A A C E 

40455 


0 


0 


0 


0 


0 


0 


\>f AAAA*5 A^COT5.T?A/T 

M0O003968B:JrO6 


O A A OO 

24488 


0 


0 


0 


0 


0 


0 


A /f AAAA1 AT f\(~* .TD AA 

MUUUUiy /UL:B09 


A A 1 

40122 


0 


0 


0 


0 


0 


0 


A A AAAA1 AT/tTVC AT 

M00003974D:b07 


23210 


0 


0 


0 


0 


0 


0 


MUUUU J y /4L):hlUz 


23358 


A 

0 


A 

0 


0 


0 


0 


0 


MUUUU3975A:G1 1 


12439 


0 


0 


0 


0 


0 


0 


MUUUU39 /od.vjUj 


5693 


A 

0 


A 

0 


A 

0 


0 


A 

0 


A 

0 


A/rAAAAO AO 1 A .E1A 

Muuuu3y©i A:b iu 


1 /OA 

3430 


A 

0 


0 


0 


0 


l 


0 


A/fAAAAIftOOP.r'A'1 

Muuuujyozdcuz 


2433 


A 

0 


0 


0 


0 


0 


0 


AA AAAAI AO^ A . A AC 

MUUUU3yo3A.AUj 


A1 AC 

9105 


A 

0 


A 

0 


0 


0 


0 


0 


AAf\ AAA/1 AOOn. A A^C 

MUUUU4UZoD:AUo 


6124 


A 

0 


0 


0 


0 


0 


0 


\/rnnnn/t noor\.on< 
MUUUU4UZ5L/.CU5 


400 /3 


A 

0 


A 

0 


0 


0 


0 


A 

0 


A/rnnnn/im i a . a i o 
MUUUU4UJ 1 A: A 12 


AA/C 1 

9061 


A 

0 


0 


0 


0 


0 


0 


Xvinnnn/tn^ i a.a to 

MUUUU4U3 1 A. A 1 Z 


yooi 


A 

0 


A 

0 


0 


0 


0 


0 


AAf\f\f\f\A f\1 C/" 1 . A A*7 

MUUUU4U35L:AU7 


0'70 OC 

37285 


A 

0 


0 


0 


0 


0 


0 


n /rn nnn a a ^ c . o n^c 
MUUUU4U35D:BUo 


17036 


A 

0 


A 

0 


0 


0 


0 


0 


KvfnnnA/i nc.o a «f\a/: 
MUUUU4U5yA.JJUo 


C >1 1 H 

5417 


A 

0 


A 

0 


0 


0 


0 


0 


N/rnnnn/! a^od. a ai 
MUUUU4U0o d , A U 1 


3 /Oo 


A 

0 


A 

0 


0 


0 


0 


0 


K/tnnnn/t nooo -r nc 

MUUUU4U /ZB.BUj 


1 /U36 


A 

0 


A 

0 


A 

0 


0 


0 


0 


\ACiOC\f\An5i i r^-r> i n 

lvlUUUU'tUo 1 \s.LJ I U 


1 CAAQ 

i ouoy 


A 
U 


A 

0 


A 

0 


0 


0 


0 


IV1UUUU4UO 1 V^.JL/ 1 Z 


14J7 1 


A 
0 


A 

0 


A 

0 


A 

0 


A 

0 


A 

0 


\a n n n n a n ox r\ ■ n n< 

MUUUUh-UoOL/.vjUO 


yzo5 


A 
0 


A 

0 


0 


0 


0 


0 


MonnnAnR7n» Am 

IVlvUUUH UO / U.AU 1 


/icon 

OooU 


A 
0 


A 

0 


A 

0 


A 

0 


A 

0 


A 

0 


MUUUU4Uy3JJ.r> iZ 


5325 


1 
1 


1 

l 


0 


1 


0 


-a 

1 


Mnnnn/tncnrvT* 1 o 

lvlUUUU4UyjL/.£> iz 


<^OC 
53Z5 


1 
1 


1 

1 


A 

0 


1 


A 

0 


l 


M00004105C:A04 


7221 


0 


0 


0 


0 


0 


0 




49^7 


n 

\j 


A 

u 


A 
U 


A 

u 


u 


A 

u 


M00004111D:A08 


6874 


0 


0 


1 


0 


0 


0 


M00004114C:F11 


13183 


0 


0 


0 


0 


0 


0 


M00004138B.-H02 


13272 


0 


0 


0 


0 


0 


0 


M00004146C:C11 


5257 


0 


1 
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0 


0 


0 


0 
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Table 6 All Differential Data for Libs 15-20 



Clone Name 


Cluster ID 


Clones in 


ft • 

Clones in 


Clones in 


Clones in 


Clones in 


Clones i 








XjSDIou 


JL;d17 


.Llblo 




¥ :k^a 
JLlDZU 


A4AHAA/1 K1 rvRAC 


1 £077 


A 

u 


A 

u 


A 


A 


A 

u 


A 




OH J J 


A 
U 


A 
U 


A 
U 


A 
U 


A 
U 


A 
\J 


Monona 1 f\QC-c 1 9 


JJ 1 37 


A 
U 


A 
U 


A 
U 


A 


A 
U 


A 
U 


\A(\C\C\(\A 1 7 1 FVRfn 


4yuo 


A 
U 


A 
U 


A 

u 


A 


A 
U 


A 
U 


IvlUUUVJH'l /Zv^.L/Uo 


1 14:74 


A 
U 


A 
U 


A 


A 


A 


A 

u 


JVIUUUU41 o jL.UU / 


lOJVZ 


n 
U 


A 

u 


A 
U 


A 


A 


A 
U 


1V1WUU4 1 OJ^/.LUj 


1 \AA 7 
1 144j 


A 

U 


A 
U 


A 


A 

V 


A 
U 


A 
U 


iVlUUUU4iy /U.riUl 


Q7 1 A 


A 
U 


A 


A 
U 


A 


A 
U 


A 
0 


1V1UUUU4ZU jD.I/ 1 z 


14j 1 1 


A 
U 


A 


A 
U 


A 
U 


A 
U 


A 
U 


A/fAAftA/IO 1 OD •r i A7 


777A 

zj /y 


A 
U 


A 


0 


A 

0 


A 


A 
0 


iviuuuuhZ 1 4i^.rilD 


1 143 I 


A 


A 


A 

u 


A 

u 


A 
U 


A 
U 


\/fAAAA/i77'2 a -ni a 

MUUUU4ZZJA.U 1U 


iOy 15 


A 
U 


A 
U 


A 
U 


A 

0 


A 
U 


A 
U 


A /T A A A A A 7 7 1 13 • FlAQ 


foyy 


A 


A 
U 


0 


A 


A 


A 
U 




lzV /I 


a 
U 


A 


0 


A 

0 


A 

0 


A 


A>f AAAfMOOQD .CAO 


o4jj 


A 


u 


A 

0 


0 


A 

0 


A 

0 


\AC\ftf\ftA07 AT3 ■r i A7 


/zlz 


A 
U 


A 

u 


A 


A 

0 


A 


A 


IV A A Art A A ^/iC\T\ .T\ A/C 


/i AAC 


A 


0 


0 


0 


0 


0 


lV/TAAAA/l77</ r,| -/"" , 1 1 


loy 14 


A 

u 


A 

u 


A 

0 


A 

0 


A 

0 


A 

0 


\>TAAfiA/10C7'D. A A/1 


14Z5o 


A 


0 


A 

0 


0 


0 


0 


\/IAAAA/17B<T> .CAO 


joU2U 


A 


A 


A 

0 


0 


A 

0 


A 

0 


\/fAAfiA/i7Q^rvi7i 7 


1 £A01 


A 
U 


A 

u 


A 


A 

0 


A 

0 


A 

0 


A /f AAAA>1 '"ItMLf^ ,XJLC\n 


13046 


0 


0 


0 


0 


0 


0 


M00004307C:A06 


9457 


0 


0 


0 


0 


0 


0 


M00004312A:G03 


26295 


0 


0 


0 


0 


0 


0 


M00004318C:D10 


21847 


0 


0 


0 


0 


0 


0 


M00004372A:A03 


2030 


0 


0 


0 


0 


0 


0 


M00004377C:F05 


2102 


0 


0 


0 


0 


0 


0 
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Table 7 All Differential Data for Libs 12-14 



Docket No. 



Clone Name 

M00001340B:A06 
M00001340D:F10 
M00001341A:E12 
M00001342B:E06 
M00001343C:F10 
M00001343D:H07 
M00001345A:E01 
M00001346A:F09 
M00001346D:E03 
M00001346D:G06 
M00001346D:G06 
M00001347A:B10 
M00001348B:B04 
M00001348B:G06 
M00001349B:B08 
M00001350A:H01 
MOOO01351B:A08 
M0OO01351B.-A08 
M00001352A:E02 
M00001353A:G12 
M00001353D:D10 
M00001355B:G10 
M00001357D:D11 
M00001361A:A05 
M00001361D:F08 
M00001362B:D10 
M00001362C:H11 
M00001365C:C10 
M00001370A:C09 
M00001371C:E09 
M00001376B:G06 
M00001378B:B02 
M00001379A:A05 
M0OOO1380D:B09 
M00001382C:A02 
M00001383A:C03 
M00001383A:C03 
M00001386C:B12 
M00001387A:C05 
M00001387B:G03 
M00001388D:G05 
M00001389A:C08 
M00001394A:F01 



Cluster ID Clones in 





Libl2 


17062 


0 


11589 


0 


4443 


4 


39805 


0 


2790 


0 


23255 


0 


6420 


0 


5007 


0 


6806 


0 


5779 


0 


5779 


o 

XX 


13576 


0 


16927 


o 


16985 


o 

XX 


3584 


o 


7187 

r XX f 


o 

Vx 


3162 

•X -P- XX M 


o 

XX 


3162 


o 

XX 


16245 


0 


8078 


0 


14929 


0 


14391 


0 


4059 


0 


4141 


1 


2379 


0 

XX 


5622 


0 


945 


0 

XX 


40132 


0 


6867 


0 


7172 


0 


17732 


2 


39833 


0 


1334 


0 


39886 


0 


22979 


1 


39648 


0 


39648 


0 


5178 


0 


2464 


0 


7587 


0 


5832 


0 


16269 


2 


6583 


0 



Clones in Clones in 
Libl3 Libl4 



0 


0 


0 


0 


2 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


1 


1 


0 


0 


0 


0 


o 


0 


0 

XX 


0 


o 


0 


0 


0 


0 


0 


0 


1 


0 


1 


0 


0 


0 


0 


1 


0 


0 


0 


0 


0 


2 


1 


0 


0 


2 


1 


0 


0 


0 


0 


0 


0 


0 


1 


0 


0 


0 


0 


0 


0 


o 

XX 


o 

XX 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 
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Clone Name 





MOOOO 1 395 A*C03 




MOOOO 1396A*C03 




MOOOO 1 402 A'E08 




M00001407B-D11 




M00001409CD12 




MOOO0 1 41 0A-D07 




M00001412RR10 




M0000141 SAH06 




M00001416A-FT01 




M00001416RFF1 1 




MOOOO 1 4 1 7 A -F09 




MOOOO 1 4 1 RR-F01 




MOOOO 1 4 1 8D-R06 


<• • t? 

L_f 


MOOO0 1 49 ir*-F01 


* t" 
jt 


M0000 1 49^R-F07 




M0000 1 494R-G09 




MOOOO 1 49SR- W08 




moooo 1 496n-ro8 


ill 


MOOOO 1498 A -FT 1 0 


*** 1' 


M0000 1 490 A H04 


g 


M0000 1 49QR- A 1 1 


* jj 


M0000 1 49QD-D07 


rs ;: - 

S "5. 


MOOOO 1 A1QC • FO 8 


5- 

i 9% 


M0000 1 449P-FJ07 

ivl\J\J\J\J 1HHZV-/..L/V/ / 


■ ■■ ^* IT 


M00O0 1 44S A -FOS 




MOOOO 1 446 A -FOS 


! %k 


MOOOO 1447 A -G03 




M0000 1 4481>rOQ 








M00001449A-A19 




MOOOO 1449A*R 12 




M00001449A-D12 




MOOOO 1449A-G10 




MOOOO 1449C-D06 




M00001450A-A02 




M00001450A-A11 




M000014^0A-R12 




moooo 1 4^0 a »no» 




MOOOO 1452A:B04 




M00001452A:B12 




MOOOO 1452A.D08 




MOOOO 1452A:F05 




MOOOO 1452C:B06 




M00001453A:E11 




MOOOO 1453C:F06 



* 7 All Differential Data for Libs 12-14 

Cluster ID Clones in Clones in 





T ih19 


T ih 


4016 


o 


o 


4009 


9 


o 




0 


u 




n 


0 
u 


QS77 


n 

u 


n 
u 


700S 


0 


A 


XSSI 


n 

u 


A 


lJJJ 0 


A 
U 


A 
U 


1(\1A 
/ o /*+ 


A 
U 


A 




1 

1 


A 




A 

u 


A 

U 


0QS9 


A 
U 


A 

U 


OJZO 


A 


A 

u 


CK77 


A 

V 


A 




A 
U 


A 

u 


10470 


A 


A 




A 
U 


A 

U 


49^1 


A 


A 


841 R9 


A 
U 


A 


9707 


A 
U 


A 

U 


4^1 S 


A 
U 


A 


40^09 


A 


A 


400S4 


A 


A 

U 


167^1 
ID/ j 1 


A 


A 

U 


1 3S19 


A 


A 

u 


7801 


A 


1 
1 


10717 


A 
U 


A 
U 


o 


7 


c 
O 




1 


A 


JO J / 


A 
U 


A 


41fvU 


A 
U 


A 
U 


JUO 1 


1 
1 


A 




A 

V 


A 


861 1 0 


A 
U 


A 




A 


l 
i 




A 
U 


A 




A 


A 


979S0 


A 
U 


A 


5/fJ9C 
04jZo 


A 
U 


A 


86859 


0 


0 


1120 


0 


0 


85064 


0 


0 


16970 


1 


0 


16130 


0 


0 


16653 


0 


0 
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Clones in 
Libl4 

0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
9 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
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Table 7 All Differential Data for Libs 12-14 



Clone Name 


Cluster iD 


Clones in 


Clones in 


Clones in 






Libl2 


Libl3 


Libl4 


\>r aaaa i a <^a a . a no 
MUUUU 1 4j4A. PiSJy 


CO 1 AO 
OJ 1UJ 


A 

0 


A 

0 


A 

0 


iV/f aaaai a ^ati-c* 1 o 


7AA^ 

/UUj 


A 
U 


A 

0 


A 

0 


\jtaaaai /i-c/fTY./^A^ 


/CO A 

689 


0 


0 


l 


\/f aaaai /i cc a .caq 
MUUUU 1 4 jjA.e,U9 


1 3238 


A 

0 


0 


0 


\jfAAAA1 /I C CT) .T7 1 ^ 

MUUUU1455B:bl2 


13U72 


0 


0 


0 


1V^AAAA1 /t^TVCAO 

MUUUUI40jJJ.rUy 


AO QO 

9283 


A 

0 


A 


0 


A^AAAAl /ICCFV.rArt 

MUUUU 14jjD:r 09 


9283 


0 


0 


0 


A/TAAAA 1 A £f\ A .UTAiC 

MUUUU 1 46UA:r Uo 


244 o 


0 


0 


0 


lV/f AAAA1 AC(\ A .171 O 

MUUUU 1 4oUA:r 1 2 


OA/1 AO 

39498 


0 


0 


0 


AAAAAA1 A .T"\AiC 

MUUUU 1 46 1 AlDUo 


1 C? 1 

1531 


0 


0 


1 


\^AAAA1 /t/CO/^.OI 1 

MUUUU 1 40 3C.B1 I 


1 A 

19 


17 


32 


/~fc 1 

31 


MUUUU1465A:B1 1 


1 A 1 A C 

30145 


0 


0 


0 


\jfAAAA1 /I C£ A .r?A'7 

MUUUU 1 466A:bU7 


4275 


0 


0 


0 


MUUOU 1 467A:B07 


38759 


0 


0 


0 


\>f aaaa 1 ^ an a .r^Ayi 
MUUUU 1 46 7 A:DU4 


39508 


0 


0 


0 


M0000 1 467A:D08 


16283 


0 


0 


0 


\/fAAAA1 ACH A ,r\AO 

MUUUU 1 46 / A:DU8 


16283 


0 


0 


0 


\yTAAAA1 A CI A . T? 1 A 

MUUUU 1 467A :E 1 0 


O A A A A 

39442 


0 


0 


0 


\/TAAAA1 A£0 A .T?A C 

MU0UU1468A:r05 


Of o A 

7589 


0 


0 


0 


\>TAAAA1 A CC\ A .01 A 

MUUUU 1469A:C 10 


1 ^AO 1 

12081 


0 


0 


0 


X /T AAA A 1 /|/n A TT1 ^ 

MOOOO 1 469A:Hl 2 


19105 


0 


0 


0 


1V/TAAAA1 >|nA A -T»1 A 

MUUUU1470A:B1 0 


1037 


0 


0 


0 


M00U0l470A:C04 


39425 


0 


0 


0 


IV/fAAAAl AT\ A .T5A1 

MUUUU14 / 1A:BU1 


39478 


0 


0 


0 


K 1AAAA1 ^01 T~\. A AC 

MUUUU 1 48 1 D: AU5 


7985 


0 


0 


0 


X/TAAAA1 /1AAT> AA>I 

MUUUU 1490B:C04 


18699 


0 


0 


0 


A/1AAAA1 A C\AT\.T2(\C 

MUUUU 1 494D:r U6 


7206 


0 


0 


0 


\>taaaa i Ann a ./~' , ao 

MUUUU 1 497A: Cj02 


2623 


1 


0 


0 


\/fAAAA1 /IOOT3. A 1 1 

Muuuui4yyt):Ai l 


1 ACO A 

1 053 9 


0 


1 


0 


Avf HAAA1 CAA A ,PAC 

MUUUU 1 jUUA:CU5 


5336 


0 


0 


0 


NyTAAAAl CAA A .CI 1 

MUUUU lDUUA:bl 1 


2623 


1 


0 


A 

0 


\jTAAAA1 ^finr.CA/l 

MUUUU 1 jUUC*h.U4 


A /I >l O 

9443 


0 


0 


0 


N/fAAAAl cn i n.pno 


9685 


0 


0 


0 


1V/TAAAA1 ZCiAC** A (\H 
MUUUU 1 jU4L/* AU / 


1 A1 OC 

1U185 


0 


0 


0 


MUUUU 1 j U4C . MUO 


cc\n a 

6974 


A 

0 


0 


0 


1VAAAAA1 ^A/ITV/^A^. 
MUUUU 1 jU4D:CjU6 


£ /I OA 

6420 


A 

0 


0 


0 


maaaai ^ata-uac 

MUUUU 1 jU / A. HUD 


OA! /CO 

39168 


A 

0 


0 


0 


\/f AAAA K1 1 A ."LJA£ 

MUUUU 1 J 1 lA:rlU6 


OA/1 1 O 

394 12 


A 

0 


0 


0 


\A AAA A KHA • A AA 

MUUUU i J I2A.AU9 


O A 1 OC 


A 

0 


0 


0 


\yfAAAA1 ^ lorvr^AA 


3956 


A 

0 


0 


0 


M00001513A:B06 


4568 


0 


0 


0 


M00001513C:E08 


14364 


0 


0 


0 


M00001514C:D11 


40044 


0 


0 


0 


M00001517A:B07 


4313 


0 


0 


0 


M00001518C:B11 


8952 


0 


0 


0 
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Table 7 All Differential Data for Libs 12-14 



Clone Name 


Cluster ID 


Clones in 


Clones m 


Clones in 








Libl3 


Libl4 


a/taaaai coo a *r*f\A 
MUUUU i jzoA.CU4 


155 1 


1 


2 


2 


iviuuuu 1 jZoa, r \)y 


1 BCK7 


A 

u 


A 
0 


A 
0 


a/taaaai coBR'UA/'i 

MUUUU 1 3Zot).riU4 


o5jO 


u 


0 


A 

0 


a/taaaa K'jt a «nA i 

NvJ\j\j\) ijJi A.UU 1 


joUoj 


u 


A 

0 


0 


a>iaaaai cjoq. a c\£l 

MUUUU 1 D JZo. AUO 


jyyu 


u 


0 


0 


A/f AAAA1 CJ^ A -C^ 1 1 
MUUUU A. 1^1 1 


242 o 


0 


0 


0 


a/iaaaai <ia a -rri/i 

MUUUU1 J.54A.CU4 




u 


A 

0 


0 


A/f AAAA 1 A .F\AA 

MUUUU 1 ->34A:LNJy 


5U97 


0 


0 


0 


A/TAAAAI c2/i a «t?ao 
MUUUU I j .54 A. r \)y 


5321 


4 


7 


6 


A/fAAAAl CO/IP. A A1 

MUUUU 1 j J4C: AU 1 


/I 1 1 c\ 

4119 


0 


0 


0 


aaaaaai esc a .tdai 
MUUUU 1 555 A:r>U 1 


7665 


0 


2 


4 


A/TAAAA1 COf A .riA/T 

MUUUU 1 535 A:C06 


20212 


0 


0 


0 


\vfAAAA1 C5C A . T? 1 A 

MUUUU 1 535A:r 1 U 


1 A/1 

39423 


0 


0 


0 


"Nvf AAAA1 A .DAl 

MUUUU 1j36A:B07 


2696 


0 


0 


0 


"KvfAAAAl CO/C A .r<AO 

MUUUU 1 536A:C08 


39392 


0 


0 


0 


Avf AAAA1 £11 A .17 1 O 

MUUUU153 /A:r 12 


39420 


0 


0 


0 


"\yfAAAA1 CO 7D ./~» AT 

MUUUU 1 j J /BrCjU/ 


3389 


0 


0 


0 


Avf AAAA 1 C/1A A .T"YAiC 

MUUUU 1 54UA:DU6 


8286 


0 


yv 

0 


0 


li /f A A A A 1 f yll A rvA^ 

MUUUU1541A:D02 


3765 


0 


0 


0 


A/TAAAA1 CA 1 A •T?A'7 

MUUUU 1 54 1 A:r U / 


22085 


0 


0 


A 

0 


AvTAAAA1C/l1 A .TJTAO 

MUUUU 1 54 1 A:HU3 


1 A 1 '"T /J 

39174 


0 


0 


0 


A/f AAAA K/IO A . A AA 

MUUUU 1 54ZA:AU9 


221 13 


0 


0 


0 


\/fAAAA1 C A**i A •TTAZ' 

MUUUU 1 542A:b06 


39453 


0 


0 


0 


A/1AAAA1 C/1/1 A • UA'J 

MUUUU i 544A:liUi 


T)i *7A 

12170 


0 


0 


0 


A/TAAAA1 ZAA A »/^AO 

MUUUU 1 544A:OU2 


19829 


0 


0 


0 


AyfAAAAl C/t/fQ.QAT 

MUUUU 1 D44Jt>.£>U / 


6974 


0 


0 


0 


A/fAAAA1 C/1C A .PA^ 

MUUUU 1 545A:CU3 


1 AO C C 

19255 


0 


0 


0 


A/f A AAA 1 C /fC A ,r\AO 

MUUUU 1 545 A:DU8 


13864 


0 


0 


0 


A/f AAA A 1 C/1 C A .#"1 1 1 
MUUUU 1 D40A:0 1 1 


1267 


0 


0 


0 


AA AAAA 1 C/1 Q A . C 1 A 

MUUUU 1 54oA:Jb 1 U 


5892 


0 


l 


0 


A/f AAAA KilO A .uno 
MUUUU 1 J45A.HUV 


1 ACO 

1U58 


1 


3 


0 


A/IAAAAI C/1A A ,DAO 

MUUUU 1 549A:BU2 


4015 


0 


l 


0 


A/f AAAA 1 C/IQ A .HAQ 
MUUUU 1 D4yA.L7Uo 


1 AA/I >! 

1U944 


1 


0 


0 


A/TAAAAI ^AQR-T7A^ 

muuuu l j4yt) . r uo 


4193 


0 


0 


0 


A/f AAAA 1 C/IOV^CA/C 

muuuu i j4yu.tvuo 


16347 


0 


0 


0 


A/f AAAA 1 CCA A • A A1 
JVIUUUU I JjUA.AU J 


7T2Q 

/259 


0 


1 

1 


0 


A/f AAAA 1 CCA A -nr\ 1 
iVlUUUU 1 DDUA.OU 1 


C 1 

5175 


l 


yv 

0 


0 


AyfAAAAl CC1 A -til A 
IVIUUUU 10 J 1 A. O 1 U 


626o 


0 


0 


1 


A/fAAAA1 CC1 A .UAC 

MUUUU 1 DM A:r U5 


OA 1 OA 

39180 


0 


0 


0 


M0000 1 1 A -HA6 


zzjyu 


u 


A 

0 


l 


M00001551C:G09 


3266 


0 


0 


0 


M00001552A:B12 


307 


6 


11 


4 


M00001552A:D11 


39458 


0 


0 


0 


M00001552B.-D04 


5708 


0 


0 


0 


M00001553A:H06 


8298 


0 


0 


0 
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cione iName 


Cluster UJ 


Clones in 
Juiolz 


Clones in 
Libl3 


Clones in 
Libl4 




Mnnoni ^^r-pi 7 

iviuuuu i j DjD,r iz 


4j Id 


U 


A 

0 


A 

0 




JVIUUUU I JJDLJ.L/ 1 v 


ZZo 1 4 


A 
U 


A 

0 


A 

0 




Mnnnni ^<^a-ra7 

IVIUUUUl JODA. £>UZ 


jyDjy 


0 


A 

0 


A 

0 




iVI\Jkj\J\J l J DDJ\.\^\J 1 




0 


A 

0 


A 

0 




\/iaaaai ^^n«m a 

IVIUUUU 1 D DOIJ.vjl u 


A Z£ 1 
4Dol 


0 


A 

0 


0 




maaaai ^aA'Paq 

IViuUUU 1 jjOA.LU" 


VZ44 


A 
0 


1 

1 


A 

0 




\>TAnnAi a -pi 1 


1 ^77 

Id// 


A 
0 


A 

0 


2 




Mnnnm ^aa-uai 

IVIUUUU 1 JDOA.riU 1 


1 

IDoD j 


1 


1 

1 


A 

0 




IVIUUUU 1 j jOo.L/Uo 


4Joo 


3 


A 

0 


l 




maaaai ^^vi-cim 

IVIUUUU 1 J DOD.VjUZ 


i izy4 


A 

0 


A 

0 


0 




maaaai «7A .r*A7 

MUUUUl J D / A.DUZ 


/UOD 


0 


0 


0 




tv^taaaa i ^7 a >r*A7 

IVIUUUU 1 J J / A.JL/UZ 


/UOJ 


A 

0 


A 

0 


0 




\/fAAAA1 <^7A .E*A1 

iVIUUUUl JJ /A.rUl 


AiC5 C 


0 


0 


0 




\/iaaaai ^7 a «ttai 

IVIUUUUl 3D /A.rU.5 


jy4yu 


0 


0 


0 


j »J 


\yfAAAA1 <<7T3.LI1 A 


c 1 no 


0 


0 


0 


V- !» 

"<& £ 


X/TAAAA1 <^7TVT\AA 
MUUUU 1 j3 /U.L/Uy 


o/ol 


0 


0 


0 


Ji *'. »■ 


IVIUUUUI j jots.ril 1 


/jI4 


0 


0 


0 


Hi 


\vfAAAA1 ££ATVTT1 f\ 


6558 


0 


0 


0 


i irii 


MAAAA1 <£1 A •r'A^ 
IVIUUUU 1 DO 1 A. CUD 


on/1 Oi^ 


A 

0 


0 


0 


* *i 12 

V\< ]• 


\^AAAA1 C^C5r>.CA/C 

MUuu0lDO3b:r06 


102 


2 


1 


2 




iv/taaaa 1 a^dh 

IVIUUUU I Ju4A.t5 1 2 


5053 


A 

0 


0 


0 




\/fAAAA1 ^Tlf^.TJA^ 

IVIUUUU 1 j / 1 ClriUo 


5749 


A 

0 


0 


0 


? V*' 


\>fAAAA1 ^7QT3.t7A>l 
IVIUUUU i D fOD.t,\)H 


23001 


A 

0 


0 


0 




\/fAAAA1 ^70TVfVM 
IVIUUUUl J /yjJ.CUJ 


653y 


0 


ft 

0 


0 


S |]J 


\^AAAA1 ^OITV A m 
IVIUUUU 1 DojJJ^Al U 


6293 


0 


0 


0 




K/fAAOAl ^B^r^PA^ 
IVIUUUU 1 DoOCCUD 


40Zi 


A 

0 


0 


ft 

0 


Hi 

5 * * 


\4AAAA 1 ^87 A «T5 1 1 
IVIUUUU 1 Do /A.t> 1 1 


393o0 


A 

0 


0 


0 




IVIUUUU 1 Dy4£>.riU4 


zoO 


1 

1 


0 


0 




\/fAAAA1 <07/" , .lJ'A7 
MUUUU 1 jy /ClriUz 


4o37 


1 


0 


0 




IVIUUUU 1 JZ7 /JJ.CUD 


1 A/17A 

1U4/U 


A 

0 


0 


ft 

0 




\A OOAA 1 ^QR A 'Cl(\1 
IVIUUUU 1 jyoA.OUj 


loyyy 


4 


2 


6 




IVIUUUl/ 1 OU I /\. JL/Uo 


zz /y4 


A 

0 


0 


0 




K/TAOAA 1 /^A/4 A «n 1 A 
IVIUUUU I OUhA.Jd I U 


1 inn 

I3yy 


6 


3 


3 




M AOOA1 ^A4 A »PA^ 

iviuuuu 1 ounA.ruj 


jyjy i 


A 

0 


0 


0 




IVIUUUU I OU /A.JC/1 1 


1 140J 


A 

0 


0 


ft 
0 




MAOAA 1 ^A8 A 'Tl(\1 

IVIUUUU J OUO/\.I>UJ 


7QA7 

/oUZ 


A 

0 


0 


0 




l\l\J\J\J\J 1 OUOO.I!*UJ 


77 1 ^< 
ZZi j J 


A 
0 


0 


ft 

0 




MAAAA1 ZIAC-VI A 
iviuuuu i o lHK^.r 1 U 


1 "5 1 C7 

1315 / 


A 

0 


0 


0 




M0000 1 (\ 1 7r-P A7 

IVIUUUU lO J f\s.LJ\)Z 


1 7AA/1 
1 /UU4 


ft 

0 


A 

0 


ft 

0 




iviuuuu i u iy\s.r i z 


4U3 14 


ft 

u 


A 

0 


A 

0 




M00001621C:C08 


40044 


0 


0 


0 




MOOOOI 623D:F10 


13913 


0 


0 


0 




MOOOOI 624A:B06 


3277 


0 


0 


0 




MOOOOI 624C:F01 


4309 


0 


0 


0 




MOOOOI 630B:H09 


5214 


0 


1 


2 
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IVIUUUU 1 0*+*+V^.x5U / 








IVIUUUU 1 D 4 tOV»/./\U 1 




M00001 6S71>rm 

iviuuuu 1 UJ / L'.V^Uj 




lvf 00001 6^7TYF0ft 
IVIUUUU i O J /L/.rUo 




MOOOO 1 (\fT>C- A HQ 
iviuuuu i uozv^./vuy 




moooo 1 GfR a -fo^. 

IVIUUUU 1 UO.j/\.JDU4 




MOOO0 1 6£QR*F07 
iviuuuu i oo^D.r uz 




MOOOO 1 670P-R07 
iviuuuu i u /uv^.nuz. 




IVIUUUU 1 D /3CHUZ 




moooo 1 £7^ a -r^no 




iVIUUUU 1 0 /OJj.r UD 


* ■' 
+. 


moooo 1 a77p-fi a 

IVIUUUU 10 / /v^.Jil U 




MOAOA 1 /C77TV A AT 
MUUUU 10/ /D: AU / 




MUUUU 1 0 /oU.r Iz 


M 


moooo i A7Q a • a r\a 
MUUUU 1 o /y/\. AUO 




\vf aaaa 1 ^to a -it 1 a 
IVIUUUU 1 0 fyfx.r I U 


i S? 
U 2 


moooo i &iqt* . tr a i 
muuuu l o /yts.r u i 


j .■. 

W ':} 


\^AAAA 1 /CTG^.TTA 1 

MUUUU i 0 /yt.r U 1 


s 3? 
? :? s - 

t* 


MOOOO 1 ^TOTVPini 
IVIUUUU 1 0 JyD.DKjD 




IVIUUUU I 0 /y\J.D\)j 


* 1 • 

Si« I; 


IVIUUUU lOoUU.rUo 


i 3 
i si 


moooo 1 ^aor^Ri o 

IVIUUUU 1 OoZL^.rJ 1 z 




moooo i ^e.£ a -paa 

IVIUUUU 1 OoOA.IlUO 




moooo i /^csp-fao 
IVIUUUU 1 oooc r uy 


* "* £ 


MOOO0 1 ^oir , »r^m 
iviuuuu i oyji^.vju i 


: ^ 


MOOOO 1 7 1 £T>14fK 




MOOon^T/t i rvr'Ao 
ivi\j\j\j\jj in lu.uuy 




mooooi 747n -rrK 

1V1UUUUJ /n/L/.^Uj 




moooo^7^qr-raq 

IVIUUUU j /jyD.DUy 




moooo^^p-rob 

IVIUUUU J) /DZL.DUo 




M00001 1&\ A -F0£ 




moooo^ ii AC- a o^ 




ivlUUUU_? / "U^.L/UJ 




MOOOO'} R96R ■ A On 




1V1UUUUJ 0 J J/\.IjU J 




IVIUUUU J /LJ.JWJ l 




A/fAAAA1C2G A -T\AO 

MUUUUJojyA.JJuo 




M00003844C:B1 1 




M00003846B:D06 




M00003851B:D10 




M00003853A:D04 




M00003853A:F12 




M00003856B:C02 




M00003857A:G10 



7 All Differential Data for Libs 12-14 



Cluster ID 


Clones in 


Clones in 




LiDlZ 


Lib 13 


jy 1 / i 


0 


0 


1 Q7£7 

lyzo / 


A 
0 


A 

0 


4oo!> 


0 


0 


7^701 
ZJZU 1 


A 

0 


A 

0 


7^7/iA 
/O/DU 


0 


A 

0 


ZJZ 1 o 


0 


A 

0 


/Uz 


0 


0 


A/I /^C 


0 


0 


1 A1 £H 


0 


A 

0 


/Ul J 


0 


0 


OTTO 


0 


0 


1 1460 


2 


0 


14oz / 


0 


0 


/J /u 


0 


0 


4416 


1 

1 


2 


666U 


0 


0 


26o75 


0 


0 


6298 


0 


0 


78091 


0 


0 


10751 


0 


0 


1 MTC 1 


0 


0 


i ncin 
10539 


0 


■a 

1 


1 /055 


0 


0 


4ozz 


0 


0 


^ oo 


0 


0 


4 J 93 


A 

0 


0 


0/z5z 


A 

0 


0 


/1A1 AO 

40108 


0 


0 


1 1 AHC 

1 14/6 


A 

0 


0 


/CAT 


A 

0 


0 


1 TAT£ 
1 /O/O 


A 

0 


0 


1 1 AC 

3 IUo 


A 

0 


0 


£TOAT 

o/yu / 


A 

0 


0 


joiy 


A 

0 


1 


1 UjU 


A 

0 


0 


7 1 C77 

zlo/ / 


A 

0 


0 


/cyy 


A 
0 


A 

0 


77O0 

/ /y© 


A 
0 


0 


/C^7Q 

oj3y 


A 

0 


0 


/COT/1 

68 /4 


0 


0 


13595 


0 


0 


5619 


0 


1 


10515 


0 


0 


4622 


0 


0 


3389 


0 


0 



338 



Docket No. 1480P 

Clones in 
Libl4 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

1 

0 
0 



Table 7 All Differential Data for Libs 12-14 







Clones in 


Clones in 


Clones in 






T iK19 


lilDlJ 


LlDl4 


M00003857A-H03 


4718 

T / It) 


n 
u 


A 


A 

u 


M00003871CE02 


4573 


V 


A 


A 

u 


M00003875BF04 


12977 


n 
U 


A 


A 

u 


M00003875B*F04 


19Q77 


A 
U 


A 


A 

u 


M00003875CG07 


847Q 


1 
1 


A 


A 


M00003876DE12 


77Q8 


A 
U 


A 

U 


A 


M00003879B-C11 




/I 


o 
o 


3 


M00003879B-D10 


31 S87 


A 


A 
[) 


A 

0 


M00003879D-A02 


14^07 

14JU / 


A 
U 


A 
U 


A 

0 


M00003885C-A02 




A 
U 


A 
U 


A 

0 


M00003885C-A0? 


1 J J /o 


A 
U 


A 
U 


A 


M00003906CE10 


Q78S 


A 
U 


A 


A 

0 


M00003 907D ■ A 09 


1QQAQ 


A 


A 
U 


A 

0 


M00003907D-H04 


I 6^1 7 

I I / 


U 


A 


A 

0 


M00003909T>r03 


00 /Z 


U 


A 


A 

0 


M 00003 9 1 2R ■ D0 1 




n 
U 


u 


A 

0 


M00003914r-FOS 




U 


1 

1 


A 

0 


M00003099 A -F06 




A 
U 


0 


0 


moooo^o^r a -wn? 


loyj / 


u 


0 


0 


lviuuuu j y j oA.nuz 


1 0QC7 


A 


0 


0 


MOOOO^Qsxr-m o 


H\J*tJJ 


A 

u 


0 


A 

0 


MOOOO^Q^flP-fn 0 




A 

0 


0 


0 


Mfifi0fn06RR'FA6 


Z44o(> 


0 


0 


0 


M00003 QlftC -RAO 


4U1ZZ 


A 


0 


0 


M000fnQ74TVFA7 


010 1 A 


u 


0 


0 


MOooo^074J>Ff09 


Z.DJJO 


A 


0 


0 


M00003Q7^A-O1 1 


1 9/11Q 


A 

u 


A 


0 


M0000397RR-O0S 


jOyj 


A 

u 


A 


0 


M00003081 A-F1 0 




A 

u 


A 

0 


0 


M00003989r-r09 


ZH- 


z 


4 


A 

0 


M00003983A-A05 


oi 

7 1 


A 

u 


A 


A 

0 


M000040? RT> A 06 


61 74 
01Z4 


A 

u 


A 

0 


0 


M000040?8r)-ro5 




A 

u 


t 

I 


A 

0 


M0000403 1 A • A 1 9 


0061 


A 

u 


A 


A 

0 


M000040'} 1 A • A 1 7 


QA61 
y\)\ji 


A 

u 


A 


0 


M00004035C- A 07 




A 
U 


A 


0 


1W0000403 SD -R06 


1 7m 6 


A 
U 


A 

0 


A 

0 


M000040SQATj06 


Oil / 


A 
U 


A 

0 


0 


M00004068B-A01 


3706 


A 
U 


A 


A 
0 


M00004072B:B05 


17036 


0 


0 


0 


M00004081C:D10 


15069 


0 


0 


0 


M00004081C:D12 


14391 


0 


0 


0 


M00004086D:G06 


9285 


0 


0 


0 


M00004087D:A01 


6880 


0 


0 


0 


M00004093D:B12 


5325 


0 


0 


0 
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Table 7 All Differential Data for Libs 12-14 



Clone Name 


^*ft> B ■ ■ ftft. 

Cluster ID 


Clones in 


Clones in 


Clone 






"W A -A /SK 

Libl2 


Lib 13 


Lib 


* /i~ a a a a a a/~\o t~~\ n 1 

M00004093D:Bl2 


5325 


0 


0 


0 


A /f AAAA A 1 t\C A A A 

M00004105C:A04 


7221 


0 


0 


0 


M00004 l 08 A:E06 


A A—I '"I 

4937 


/-ft 

0 


0 


0 


\IAAAAyl 1 1 1 . A AO 

M000041 1 1D:A08 


6874 


0 


/-ft 

0 


0 


-ft JAAAA A 1 1 A fy n 1 

M000041 l4C:Fl l 


13183 


0 


0 


0 


•ft f AAArt A 1 OOn TTA^ 

M00004l38B:H02 


13272 


0 


0 


0 


Ik AAA A 1 A /" /~t -l 1 

M00004l46C:Cll 


5257 


0 


0 


1 


■» fAAAA /I 1 r* -1 r\ t% AO 

M00004151D:B08 


16977 


0 


0 


0 


■» <TAAAA A 1 n/™( A AA 

M00004l57C:A09 


/" / ™ ^ 
6455 


/ft 

0 


0 


0 


M00004169C:C12 


5319 


/ftv 

0 


0 


0 


M00004l7lD:B03 


4908 


0 


0 


0 


M00004l72C:D08 


11494 


0 


0 


0 


M00004l83C:D07 


16392 


0 


0 


0 


M00004l85C:C03 


11443 


2 


0 


0 


M00004l97D:H0l 


8210 


0 


0 


0 


M00004203B;Cl2 


14311 


0 


0 


0 


M000042l2B:C07 


2379 


0 


0 


0 


M000042l4C:H05 


11451 


0 


0 


0 


M00004223A:GlO 


16918 


0 


0 


0 


M00004223B:D09 


7899 


0 


0 


0 


M00004223D:E04 


12971 


0 


0 


0 


M00004229B:F08 


6455 


0 


0 


0 


_ m /-v f"ft y-v /■•* * f\ fx ^*ft« /-ft ^ft« 

M00004230B:C07 


7212 


0 


0 


1 


M00004269D:D06 


4905 


0 


0 


0 


M00004275C:C11 


16914 


0 


0 


0 


M00004283B:A04 


14286 


0 


0 


0 


Mf /^\. /*V «*V XX J J« ff X. ■ 1 <^ft ^ 

M00004285B:E08 


56020 


0 


0 


0 


V M Mft /"ft. /"ft /V A — «b aft- -aMw j 

M00004295D:F12 


16921 


0 


0 


0 


M00004296C:H07 


13046 


0 


0 


0 


M00004307C:A06 


9457 


1 


0 


0 


M00004312A:G03 


26295 


0 


0 


0 


M00004318C:D10 


21847 


0 


0 


0 


M00004372A:A03 


2030 


0 


0 


0 


M00004377C:F05 


2102 


0 


0 


0 
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